Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), Valletta, Malta, 17-23 May 2010
The Bank of Russian Constructions and Valencies (Russian FrameBank) is an annotation project that takes as input samples from the Russian National Corpus (http://www.ruscorpora.ru). Since Russian verbs and predicates from other POS classes have their particular and not always predictable case pattern, these words and their argument structures are to be described as lexical constructions. The slots of partially filled phrasal constructions (e.g. vzjal i uexal ‘he suddenly (lit. took and) went away’) are also under analysis. Thus, the notion of construction is understood in the sense of Fillmore’s Construction Grammar and is not limited to that of argument structure of verbs. FrameBank brings together the dictionary of constructions and the annotated collection of examples. Our goal is to mark the set of arguments and adjuncts of a certain construction. The main focus is on realization of the elements in the running text, to facilitate searches through pattern realizations by a certain combination of features. The relevant dataset involves lexical, POS and other morphosyntactic tags, semantic classes, as well as grammatical constructions that introduce or license the use of elements within a given construction.
The article discusses the most recent trends in the development of the English progressive. A corpus-based approach to linguistic research is seen as an effective means of determining reliability of the data retrieved and helps track the major diachronic dynamic in the increasing frequency of the progressive aspect that has taken place since the beginning of the 20th century. The article specifically deals with the extension of the progressive to new constructions, such as modal, present perfect and past perfect passive progressive, and also accounts for the use of progressive forms in the contextual environment not generally characteristic of them.
The paper describes the structure and possible applications of the theory of K-representations (knowledge representations) in bioinformatics and in the development of a Semantic Web of a new generation. It is an original theory of designing semantic-syntactic analyzers of natural language (NL) texts with the broad use of formal means for representing input, intermediary, and output data. The current version of the theory is set forth in a monograph by V. Fomichov (Springer, 2010). The first part of the theory is a formal model describing a system consisting of ten operations on conceptual structures. This model defines a new class of formal languages – the class of SK-languages. The broad possibilities of constructing semantic representations of complex discourses pertaining to biology are shown. A new formal approach to developing multilingual algorithms of semantic-syntactic analysis of NL-texts is outlined. This approach is realized by means of a program in the language PYTHON.
This paper is devoted to the use of two tools for creating morphologically annotated linguistic corpora: UniParser and the EANC platform. The EANC platform is the database and search framework originally developed for the Eastern Armenian National Corpus (www.eanc.net) and later adopted for other languages. UniParser is an automated morphological analysis tool developed specifically for creating corpora of languages with relatively small numbers of native speakers for which the development of parsers from scratch is not feasible. It has been designed for use with the EANC platform and generates XML output in the EANC format.
UniParser and the EANC platform have already been used for the creation of the corpora of several languages: Albanian, Kalmyk, Lezgian, Ossetic, of which the Ossetic corpus is the largest (5 million tokens, 10 million planned for 2013), and are currently being employed in construction of the corpora of Buryat and Modern Greek languages. This paper will describe the general architecture of the EANC platform and UniParser, providing the Ossetic corpus as an example of the advantages and disadvantages of the described approach.
The book contains the proceedigs of the 18th International Conference on Automatic Processing of Natural Langage (France, Montpellie, 27th June - 1st July 2011).
Four electronic corpora created in 2011 within the framework of the “Corpus Linguistics: the Albanian, Kalmyk, Lezgian, and Ossetic Languages” Program of Fundamental Research of the RAS are presented. The interface and functionalities of these corpora are described, engineering problems to be solved in their creation are elucidated, and the promises of their development are discussed. A particular emphasis is made on the compilation of dictionaries and automatic grammatical markup of the corpora.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The project we present – Russian Learner Translator Corpus (RusLTC) is a multiple learner translator corpus which stores Russian students’ translations out of English and into it. The project is being developed by a cross-functional team of translator trainers and computational linguists in Russia. Translations are collected from several Russian universities; all translations are made as part of routine and exam assignments or as submissions for translation contests by students majoring in translation. As of March 2014 RusLTC contains the total of nearly 1.2 million word tokens, 258 source texts, and 1,795 translations. The paper gives a brief overview of the related research, describes the corpus structure and corpus-building technologies used; it also covers the query tool features and our error annotation solutions. In the final part we make a summary of the RusLTC-based research, its current practical applications and suggest research prospects and possibilities.
This workshop is about major challenges in the overall process of MWE treatment, both from the theoretical and the computational viewpoint, focusing on original research related to the following topics:Manually and automatically constructed resources Representation of MWEs in dictionaries and ontologies MWEs in linguistic theories like HPSG, LFG and minimalism MWEs and user interaction Multilingual acquisition Multilingualism and MWE processing Models of first and second language acquisition of MWEs Crosslinguistic studies on MWEs The role of MWEs in the domain adaptation of parsers Integration of MWEs into NLP applications Evaluation of MWE treatment techniques Lexical, syntactic or semantic aspects of MWEs
The paper is focused on the study of reaction of italian literature critics on the publication of the Boris Pasternak's novel "Doctor Jivago". The analysys of the book ""Doctor Jivago", Pasternak, 1958, Italy" (published in Russian language in "Reka vremen", 2012, in Moscow) is given. The papers of italian writers, critics and historians of literature, who reacted immediately upon the publication of the novel (A. Moravia, I. Calvino, F.Fortini, C. Cassola, C. Salinari ecc.) are studied and analised.
The Incongruity Theory of Humor in its different forms states that the cause of laughter is the perception of something that violates our mental patterns and expectations. It seems particularly true of comic absurdity which is based on a deadpan violation of established norms of logic and convention. The current paper explores linguistic mechanisms that underlie the comic effects in the works of Mikhail Zoshchenko, one of the great satirists of Soviet Russia. Zoshchenko is well-known for his simplified writing style which imitates the language and mentality of “the simple people” while at the same time mocking the nascent Soviet officialdom and its demands for the popular accessibility of art. The paper considers Zoshchenko’s narrative through the prism of conventional implicatures (Grice 1961, Karttunen and Peters 1979, Horn 2004, Potts 2005, 2007), or meanings that are not directly stated in the utterances, but implied by the speaker; e.g. Even John solved the problem implies that it was it was not expected of John to solve it. In successful communication, implicit meanings form the shared background of conversational partners; violation of these shared norms may be used to create comical effect. One of the most conventionalized societal norms and one Zoshchenko most frequently violates is the value of human life and, hence, solemn attitude to death. The narrator in Zoshchenko’s stories repeatedly implies otherwise, thus creating a comical portrait of the mentality of Homo Soveticus. Consider a quote from “The story about a greedy dairy woman”: “So, her husband died. At first she probably took it lightly. - A-a, she thought – no big deal… But then she realized – yes, this is a big deal!... Eligible bachelors are not running around in bunches. And then, of course, she started grieving” (shift in emphasis; the cause for grief is not the husband’s death but its inconvenience for the surviving wife). The story “A restless old man” (about an old man who lives in a communal flat and falls into lethargic stupor taken by his family and neighbors for death and then after waking up really dies) is based on violating the same conventional implicature. Throughout the story the narrator implicitly creates the image of death as an inconvenient occurrence and of a deceased person as an unwanted piece of waste. The harshly comic effect is achieved by implicatures about the shallow emotional impact of death (“And then of course there is aggravation: because the room is small and here is a superfluous element”, “If my husband, this surviving idiot, ordered the hearse right away, then the wait for it would have only been three days”; “The summoned doctor reassured everybody that now the old man is bona fide dead”); by violation of semantic compatibility rules whereby the seemingly dead old man is alternately referred to as an animate being (“The dead man is lying and demanding the last tribute to be paid to him”, “The babysitter is afraid to be in the room where a dead person is living”) or inanimate object (“There is so little space that there is even nowhere to pile up the old man”; “I am going to pile him up in the hall, let him wait for the hearse there”).
In the article the patterns of the realization of emotional utterances in dialogic and monologic speech are described. The author pays special attention to the characteristic features of the speech of a speaker feeling psychic tension and to the compositional-pragmatic peculiarities of dialogic and monologic text.
The article examines the main trends in the study of the Stalinist period and the phenomenon of Stalinism in connection with the mass opening of the archives.