Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015)
The volume includes 69 papers read at the International conference on computer linguistics and AI applications Dialogue 2015 which covers a wide range of theoretical and applieds research in natural language desccription, linguistic processes modelling and creating natural language applications.
When describing words which denote real life objects, dictionaries tend to use scientific terms and classifications, even when dealing with natural language. This approach may lead to misunderstanding, especially in cases when scientific classification (e.g. in biology) differs from what is found in natural language data. One of such cases is discussed here, namely the small but rather interesting class of nuts (Russian orexi). In the botanic world view nuts usually include hazelnuts and chestnuts, but do not include walnuts or almonds (which are considered stone fruits), pine nuts (seeds), peanuts (legumes), pistachio (kernels), etc. The Russian orex, English nut, Latin nux exhibit similar behaviour here. Explanatory dictionaries of Russian more or less follow the botanical definitions, even though in many fields (such as cooking, food industry, medicine, etc.) nuts are classified differently. In order to establish the boundaries of nuts in Russian, more than 1000 native speakers were questioned and multiple texts of different periods were studied. The result is a peculiar class which could not be identified with any of the natural language supercategories described by Anna Wierzbicka. A new lexicographic description is proposed for some words included into this class.
Russian subjunctive is expressed by an analytical form which consists of subjunctive particle by (b) and past indicative or infinitive or a few predicative adverbs and adjectives. The subjunctive particle is an enclitic. It often merges with subordinate conjunctions, which yields words functioning as conjunctions and containing the subjunctive particle. Historically, the particle by in conjunctions can be traced back to the marker of subjunctive. Synchronically, however, the group is not homogenous. The aim of the paper is to find out which of the conjunctions with by should be considered as containing the marker of subjunctive, and test whether the particle can or can not be separated from the conjunction. Four criteria are used. The first and the second, namely, (a) the forms available in the subordinate clause with the conjunction and (b) the possibility of repetition of the particle by with the second predicate shows that comparative conjunctions do not synchronically contain the subjunctive marker. The third and fourth criteria, namely (c) the omission of the particle by and (d) its ability to be separated from the conjunction by another words give different results.
The problem of morphological ambiguity is widely addressed in the modern NLP. Mostly ambiguity is resolved with the use of large manually-annotated corpora and machine learning. However, such methods are not always available, as good training data is not accessible for all languages. In this paper we present a method of disambiguation without gold standard corpora using several statistical models, namely, Brill algorithm (Brill 1995) and unambiguous n-grams from the automatically annotated corpus. All the methods were tested on the Corpus of Modern Greek and on the Corpus of Modern Yiddish. As a result, more than a half of words with ambiguous analyses were disambiguated in both corpora, demonstrating high precision (>80%). Our method of morphological disambiguation demonstrates that it is possible to eliminate some of the ambiguous analyses in the corpus without specific linguistic resources, only with the use of raw data, where all possible morphological analyses for every word are indicated.
The paper discusses the present stage of the evolution of the initial [n]/[j] stem alternation in Russian third person pronouns. After providing a short overview of the origins of the forms, I focus on their category status, discuss Zalizniak’s ‘adpositionality’ in some detail, and then proceed to considering the cases where the ‘n’-forms are induced by a distant ‘controller’. I will show that the fact that the ‘n’-forms are essentially variants is better accounted for by the notion of ‘trigger’ of a morphological variant. To my eyes, this open ways to a better understanding of the observed evidence than that using the conventional notion of morphosyntactic controller, on the one hand—and certainly than explaining them in (morpho)phonological terms. In the end, I will briefly argue that, in a sense, the evolution of the alternation is similar to degrammaticalization, showing a movement from a morphophonologically conditioned external sandhi to a morphosyntactic category similar to government.
This paper is a pilot comparative study on coreference chaining in three languages, namely, Czech, English and Russian. We have analyzed 16 parallel English-Czech newspaper texts and 16 texts in Russian (similar to the English-Czech ones in length and topics). Our motivation was to find out what the linguistic structure of coreference chains in different languages is and what types of distinctions we should take into account for advancing the development of systems for coreference resolution. Taking into account theoretical approaches to the phenomenon of coreference we based our research on the following assumption: the recognition of coreference links for different structural types of noun phrases is regulated by different language mechanisms. The other starting point was that different languages allow pronominal chaining of different length and that coreference chains properties differ for the languages with different strategies for zero anaphora and different systems for definiteness marking. This work reports our first findings within the task of the structural NP types’ distribution comparison in three languages under analysis.
Aphasia is language impairment due to brain damage. Word-finding and word-retrieval problems can be very prominent in the speech of people with aphasia, being detectable in almost every aphasic speaker. On the other hand, word-finding difficulties and speech errors can sometimes oc-cur in speech of neurologically healthy people. It is assumed that the same psycholinguistic levels of word-retrieval breakdown can account for the mistakes of both groups. In the meanwhile, retrieving of a single word from mental lexicon is not the only possible level of hindrance for a speaker: ref-erential and lexical choices that take place at more general discourse and pragmatic level can also be disturbed. The Russian CLiPS—Russian CLinical Pear Stories—is a corpus of film-elicited narratives retrieved following(Chafe, 1980) methodology from healthy and language-impaired cohorts. The aim of our research was to in-vestigate the characteristics of formal markers of word retrieval difficulties in narratives of neurologically healthy people and people with aphasia. Three types of markers were considered(discourse markers, false starts and self-corrections) in the nominations of common referents of Pear sto-ries narratives. The markers at different breakdown levels are qualitatively analysed, creating a platform for future analysis.