Russian in the English mirror: (non)grammatical constructions in learner Russian
The main trends and achievements in corpus linguistics are presented in this collection os abstracts of plenaries, papers and posters presented at the 8th internation conference Corpus Linguistics - 2015 (Lancaster University, UCREL, July 2015)
The paper proposes a corpus analysis of a Russian adjective slavnyj. Its semantic evolution is analyzed through its distribution in XVIII-XXI centuries texts, including the main types of its usages, its main meanings, and possible shifts from one meaning to another. It is shown that the initial semantics of ‘being famous’ that the adjective slavnyj expresses up to the beginning of the XIX century gives rise to the idea of strong positive evaluation. Slavnyj is very frequent as a positive marker during the XIX century, and then it gradually loses its intrinsic expressiveness. Nowadays, this adjective became much less frequent, having undergone a peculiar meaning shift: it marks a moderate compliment. While the grammaticalization pattern of slavnyj represents a well known shift ‘famous’ => ‘good’ (as a specific case of a more general pattern ‘differing from the others’ => ‘good’) widely attested crosslinguistically, the further stage of the semantic evolution of the word slavnyj appears to be more exotic.
The volume is the third issue of a corpora-based grammar of Russian. The volume deals with the issues of parts of speech and, more generally, with formal classes of lexicon, It comprises descriptive papers of separate POS and lesser world classes.
Pragmatic markers (PMs) mainly have an influence on a pragmatic aspect of communication and are mostly devoid of their own referential meaning. These markers are indispensable elements of oral communication in any language. The article suggests a typology of pragmatic markers for Russian everyday speech that includes 10 basic types. The frequency study for the use of various marker types is carried out on the basis of two representative speech corpora – a corpus of Russian Everyday Speech “One Speech Day” (ORD) and “Balanced Annotated Collection of Texts” (SAT). Preliminary data about PM distribution in dialogues and monologues was obtained and the article describes the main difficulties one comes across while annotating PMs according to our methodology. The main requirements for creating a Dictionary of Pragmatic Markers are enumerated. The paper indicates the scope of pragmatic markers and further prospects for their use, which includes (but not limited to) datasets labelling for voice assistants and speech recognition systems development.
This paper describes some aspects of the West Circassian Corpus, an electronic resource containing annotated texts in West Circassian (Adyghe). We focus on some ways in which the corpus can serve as an important instrument for teaching West Circassian. In particular, it is shown that the West Circassian Corpus can be used for compiling the exercises, for checking the actual use of the language as well as for organizing small-scale research projects devoted to the functioning of West Circassian. The paper contains some examples of the use of the West Circassian Corpus for teaching purposes.
The study examines tense variation in complement and subject clauses subordinate to and co-temporal with matrix past tense verbs in Russian. The semantics of the matrix verb is commonly named as one of the major factors that govern tense choice in complement and subject clauses: verbs of speech are said to exclusively license present tense in embedded clause; existential verbs, on the other hand, are said to block present tense; whereas verbs of perception are said to allow both past and present tense, cf. Vanja skazal, čto Maša xorošo vygljadit ‘Vanya said that Masha looked [pres] great’ vs. Slučalos’, čto Maša vygljadela xorošo ‘It happened (there were times) that Masha looked [past] great’ vs. Vanja videl, čto Maša xorošo vygljadit/vygljadela ‘Vanya saw that Masha looked [pres/past] great’. However, despite a considerable body of research on the topic, a comprehensive investigation of tense distribution across various semantic classes of matrix verbs has not yet been undertaken. This paper presents a corpus-based analysis of tense distribution in complement and subject clauses across five semantic classes of the matrix verb: speech, mental, emotion, perception, and existential. Statistical analysis revealed the following probabilistic hierarchy of licensing past tense: [existential verbs (97%)> verbs of perception + complementizer kak (70%)> verbs of perception + complementizer čto (41%)> mental verbs (9%)> verbs of speech (1%)]. This hierarchy rectifies our notion of tense choice in complement and subject clauses in Russian. It is also notable for its high correspondence with the interclausal bondedness hierarchy maintained in typological studies. The suggested isomorphism of the two hierarchies implies that tense appears to be a probabilistic marker of interclausal bondedness, with the absolute tense encoding closer and the relative looser relations.
We analyze the dynamics of dialect loss in a cluster of villages in rural northern Russia based on a corpus of transcribed interviews, the Ustja River Basin Corpus. Eleven phonological and morphological variables are analyzed across 33 speakers born between 1922 and 1996 in a series of logistic regression models. We propose three characteristics for a comparison of the rate of loss of different variables: initial level, steepness, and turning point. We show that the dynamics of loss differs significantly across variables and discuss possible reasons for such differences, including perceptual salience, initial variation in the dialect, and convergence with regionally or socially defined varieties of Russian. In conclusion, we discuss the pros and cons of logistic regression as an approach to quantitative modelling of dialect loss. Our paper contributes to the study and documentation of Russian dialects, most of which are on the verge of extinction.
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies.