Государственные языки России в Википедии: к вопросу о сетевой активности минориторных языковых сообществ
About Wikipedia on Langs of Russia
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as a state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we suggest a method to compile the gold standard datasets for the Russian poetry. Secondly, we focus on the taggers’ performance in the identification of the part of speech tags and lemmas. We reveal what kind of POS classes, paradigm classes and syntactic patterns mostly affect the quality of processing.
This chapter deals with segmentation, definition of reference units and annotation of the first corpus of Russian narratives by individuals with brain damage – people with aphasia and right hemisphere damage – and neurologically healthy speakers. We show that such parameters as pause length and intonation contours cannot be used for segmentation of impaired speech. Instead, we use syntactic criteria for identification of the reference, or – as they are called in this paper – elementary discourse units (EDUs). The Russian CliPS (Clinical Pear Stories) corpus contains multi-layer annotation of audio- and video-recordings, performed on micro- and macro-linguistic level, and can be used as a source for qualitative and quantitative research on various aspects of speech in aphasia and right hemisphere damage.
The volume is the third issue of a corpora-based grammar of Russian. The volume deals with the issues of parts of speech and, more generally, with formal classes of lexicon, It comprises descriptive papers of separate POS and lesser world classes.
Existing research shows that distribution of the speaker’s attention among event’s protagonists affects syntactic choice during sentence production. One of the debated issues concerns the extent of the attentional contribution to syntactic choice in languages that put stronger emphasis on word order arrangement rather than the choice of the overall syntactic frame. To address this, the current study used a sentence production task, in which Russian native speakers were asked to verbally describe visually perceived transitive events. Prior to describing the target event, a visual cue directed the participants’ attention to the location of either the agent or the patient of the subsequently presented visual event. In addition, we also manipulated event orientation (agent-left vs. agent-right) as another potential contributor to syntactic choice. The number of patient-initial sentences was the dependent variable compared between conditions. First, the obtained results replicated the effect of visual cueing on the word order in Russian language: more patient-initial sentences in patient cued condition. Second, we registered a novel effect of event orientation: Russian native speakers produced more patient-initial sentences after seeing events developing from right to left as opposed to left-to-right events. Our study provides new evidence about the role of the speaker’s attention and event orientation in syntactic choice in language with flexible word order.
The work deals with the strategies for predicate agreement to quantified noun groups headed by nouns. In Russian, as in other Slavic languages, predicate agreement with quantified noun phrases allows singular or plural forms of the predicate. As for the sentences with quantifiers-nouns r’ad, polovina, chast’, mnozestvo, three agreement strategy are probable: predicate agrees with the head of noun phrase and takes singular masculine/feminine/neuter or agrees semantically, in plural, or takes default form – singular neuter. The last type of agreement is rare and non-standard. The most frequent is the first type, full grammatical agreement.
The study based on the National Russian Corpus showed that the strategies of predicate agreement with quantifiers-nouns are not identical. The predicate more probable agrees in plural with NP, headed by the word r’ad, than the words polovina, chast’, mnozestvo.
The reasons for the differences in strategies of predicate agreement and the factors of context that influence the choice of predicate, are analysed in the paper.
The investigation of the difference in the agreement strategy implies consideration of the semantic and grammatical properties of quantifiers.
It is shown that some quantifiers have indefinite and abstract meaning (r’ad, mnozestvo), their grammatical properties are limited (the ability to combine with a definition, to be used without dependent word). This behavior differs from usual noun behavior. The “not noun-like” properties should be the main reason for fluctuations in the choice of the form of the predicate, as Corbett has shown [Corbett 1979, Krasovitsky 2010]. The quantifiers of more specific and substantive meaning (polovina, chast’) that behave like nouns, seems to require the full grammatical agreement of predicate.
The dependence of the choice of the predicate form from grammatical gender is discussed in the paper.
The statistical analysis of the influence of the context factors is carried out. The factors of animacy, word order, conjunct subjects. conjunct predicate, the type of predicate, adjectives agreed with quantifier are considered. Some views on the influence of the factors of the context generally accepted in Russian linguistics are refined. The study has shown that a few factors can influence on the choice of the form of the predicate agreed with the words pololvina or chast’– predominantly conjunct noun phrases and animacy. The agreement with mnozestvo is influenced by more factors of context. All the factors are extremely important for the predicate agreement with r’ad
Our paper investigates the variation of lexical stress placement in Modern Standard Russian past tense verbal forms. This kind of variation has arisen due to complex interactions of various processes in the development of Russian. Its present-day state is said (often rather impressionistically) to be conditioned by intra-speaker sociolinguistic factors, but it must be noted that cases of inter-speaker variation can also be observed. We put forward a proposal that stress placement in forms with variable stress is influenced by the rhythmic pattern of immediate linear context. To support this, we report on a pilot experiment that shows the preference towards alternating rhythm in a sequence consisting of a past tense verbal form of a transitive verb and its direct object, thus conforming to the fundamental principle of rhythmic alternation. The results also raise some questions about the phonology of stress and stress variation in Russian and beyond.
The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies.