Automatic data collection in lexical typology
The paper addresses an issue of an automatic data collection for lexical typological studies in the Frame approach paradigm. A research in this framework is based on the analysis of distributional properties of the lexemes in question. Hence, questionnaires for such studies consist of typical contexts where lexical items from a given semantic domain can potentially occur. We aim at filling these questionnaires automatically, and this task can be splitted into two different problems: questionnaire translation and its filling with the relevant data. We suggest three methods for the first task completion (translation via bilingual dictionaries vs. online cloud translators vs. parallel corpora), and two algorithms are focused on the second task (filling of a questionnaire based on monolingual corpora vs. on online translators). We test our algorithm on the data from four semantic domains of qualitative features (‘sharp’, ‘smooth’, ‘thick’, ‘thin’).
The article deals, in a typological perspective, with verbs describing sounds of inanimate objects (cf. the noise of a door being opened, of coins in somebody’s pocket, of a river, etc.). The analysis is based on the data from four languages (Russian, German, Komi-Zyrjan, Khanty), which were obtained from dictionaries, corpora and field investigation. We discuss, first, the primary meanings of these verbs and identify the parameters that underlie semantic distinctions between them (type of sound source and its features, type of situation causing the emission of a sound, acoustic properties of sounds). Then we consider the derived meanings of sound verbs, which are developed through metonymic and metaphoric shifts and analyze the mechanisms behind each of these shifts. Finally, we examine a type of semantic change in our data which cannot be explained in terms of either of those mechanisms and hence represents a separate kind of meaning shift.
The starting point of the study is the hypothesis of a discursive proximity of Church Slavonic and Christian religious discourse of the modern Russian language. Analysing lexical structure with quantitative corpus methods we show that the latter is closer to Church Slavonic than the mainstream modern Russian language. This can serve as a proof of the specificity of the register in question, an additional argument when deciding on its separate status. Research is based on the material of the Russian National Corpus, namely, the Church-Slavonic corpus, the Main corpus and the Subcorpus of church-and-theologу texts. Using the log-likelihood criterion and PCA visualizations, we reveal the body of lexemes in Russian texts that can be considered Slavonicisms (tserkovnoslavyanizmy) and show that the "distance" between the corpora can be measured differently if one takes into account adjectives, nouns and verbs separately.
Temperature phenomena are universal, relatively easily perceptible by humans and crucial for them, but their conceptualisation involves a complex interplay between external reality, bodily experience and evaluation of the relevant properties with regard to their functions in the human life. The meanings of temperature terms are, thus, both embodied and perspectival. Rather than reflecting the external world objectively, they offer a naïve picture of it, permeated with folk theories that are based on people’s experience and rooted in their culture (cultural models). Languages differ as to how many temperature terms they have and how these categorize the temperature domain in general Closely related languages can show remarkable differences in their uses of temperature adjectives, even when these are cognates to each other; conversely, temperature systems can show remarkable areal patterns. Temperature terms can belong to different word classes, even within one and the same language (adjectives – ”cold”, verbs – ”to freeze”, nouns – ”coldness”). Languages vary in their word-class attribution of temperature concepts: thus, for instance, many languages lack temperature adjectives. Word-class attribution and, further, lexicalization of temperature expressions and the possible syntactic constructions in which they can be used are sensitive to their semantics.
Temperature meanings are often semantically related to other meanings, either synchronically (within a polysemantic lexeme) or diachronically. Thus, temperature concepts often serve as source domains for various metaphors and are extended to other perceptional modalities (‘hot spices’, ‘warm colour’). Temperature meanings can also develop from others, e.g., ‘burn, fire’ >’hot’, or ’ice’ > ’cold’. Finally, the meanings of temperature terms can also change within the temperature domain itself, e.g. ‘warm, hot’ > ‘lukewarm’, as in Lat. tep- ‘warm’ vs. English tepid ‘lukewarm’. While some languages show extensive semantic derivation from the temperature domain, others lack it or use it to a limited degree. Languages vary as to which temperature term has predominantly positive associations in its extended use (cf. ‘cold’ in Wolof vs. ‘warm’ in the European languages), partly due to the different climatic conditions.
Temperature terms have, on the whole, received relatively little attention. Cross-linguistic research on temperature is mainly restricted to Sutrop (1998, 1999) and Plank (2003), which focus on how many basic temperature terms there are in a language and how they carve up the domain among themselves. There has been no cross-linguistic research on the grammatical behaviour of temperature expressions, apart from a few mentions.
In theoretical semantics, temperature adjectives have mainly figured in discussions of lexical fields, antonymy and linguistic scales (cf. Lehrer 1970, Cruse & Togia 1995, Sutrop 1998, cf. also Clausner & Croft 1999). Koptjevskaja-Tamm & Rakhilina 2006 suggest that linguistic categorization of the temperature domain is sensitive to several parameters, that are important and salient for humans and can be distinguishable by simple procedures relating to the human body. Within the Natural-Semantic Metalanguage, Goddard & Wierzbicka (2006) propose the general formula for describing the language-specific meanings of temperature terms via reference to fire.
Extended uses of temperature words have been studied indirectly in cognitive linguistics, primarily in research on the metaphors underlying emotions, e.g. AFFECTION IS WARMTH (Lakoff & Johnson 1997:50) and ANGER IS HEAT (Kövecses 1995, also Goossens 1998; cf. also Shindo 1998-99). An important question raised in Geeraerts & Grondelaers (1995) is to what degree such extensions reflect universal metaphorical patterns or are based on common cultural traditions. The current empirical evidence for the suggested metaphors is still relatively meagre.
The article examines the relationship between time and space in language on the basis of adjectives denoting high or low speed in Russian and other (mostly Slavic) languages. In physics the notion of speed is defined in terms of time and space (distance per time unit). It is argued, however, that speed in natural language is a primarily temporal concept involving the comparison of the temporal properties of a ‘target situation’ with those of a ‘norm’. Speed terms are shown to develop their own metaphors and metonymies, subsequently becoming connectors and intensifying markers. This argument has important theoretical implications insofar as it demonstrates that the domain of time is less dependent on space than the traditional view might indicate.
The paper presents a project aimed at the development of a Russian Learner Parallel Corpus, discusses the existing analogues, describes the current status and the tasks in which it could be used. The existing parallel corpora contain (comparatively) “correct” translations; whereas the aim of the present project is to create a sufficiently large corpus of imperfectly translated Russian and English texts together with their sources and use it as a tool for translation studies, especially those related to translation mistakes. The new corpus will be a valuable resource for computational linguistics as it provides another way of getting data for evaluation which could be used to improve machine translation systems. As of now, the corpus is available on-line, it already contains nearly half a million word tokens and is growing. The main source of material is translations made by student translators in Russian universities.
In this article we present the results of research into discourse features characterising a lexico-semantic group of synonyms denoting a human being: human being, person, individual, personality and man. The main tool for analysis was language corpora, which made it possible not only to determine more precisely the functional styles the lexemes tend to be used in, but also to describe thematic characteristics of the texts in which the analysed lexical units show the highest frequency of use