Learning Computational Linguistics through NLP Evaluation Events: the experience of Russian evaluation initiative
We present in the paper our experience of involving the students of the department of theoretical and computational linguistics of the Moscow State University into full-cycle activities of preparing and evaluating the results of the NLP Evaluation forums, held in 2010 and 2012 in Russia. The forum of 2010 started as a new initiative and was the first independent evaluation of morphology parsers for Russian in Russia. At the same time the forum campaign has been a source of a successful academic course which resulted in a close-knit student team, strong enough to implement the two-year research for the second forum on syntax, held in 2012. The new forum of anaphora (to be held in 2014) is now prepared mostly by students.
This paper presents an algorithm that allows the user to issue a query pattern, collects multi-word expressions (MWEs) that match the pattern, and then ranks them in a uniform fashion. This is achieved by quantifying the strength of all possible relations between the tokens and their features in the MWEs. The algorithm collects the frequency of morphological categories of the given pattern on a unified scale in order to choose the stable categories and their values. For every part of speech, and for all of its categories, we calculate a normalized Kullback-Leibler divergence between the category’s distribution in the pattern and its distribution in the corpus overall. Categories with the largest divergence are considered to be the most significant. The particular values of the categories are sorted according to a frequency ratio. As a result, we obtain morphosyntactic profiles of a given pattern, which includes the most stable category of the pattern, and their values.
The paper describes the structure and possible applications of the theory of K-representations (knowledge representations) in bioinformatics and in the development of a Semantic Web of a new generation. It is an original theory of designing semantic-syntactic analyzers of natural language (NL) texts with the broad use of formal means for representing input, intermediary, and output data. The current version of the theory is set forth in a monograph by V. Fomichov (Springer, 2010). The first part of the theory is a formal model describing a system consisting of ten operations on conceptual structures. This model defines a new class of formal languages – the class of SK-languages. The broad possibilities of constructing semantic representations of complex discourses pertaining to biology are shown. A new formal approach to developing multilingual algorithms of semantic-syntactic analysis of NL-texts is outlined. This approach is realized by means of a program in the language PYTHON.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The book contains the proceedigs of the 18th International Conference on Automatic Processing of Natural Langage (France, Montpellie, 27th June - 1st July 2011).
Opinions of professors and chairmen of chambers of appeals on the quality of teaching in universities' law schools in imperial Russia in the late 19th - early 20th century are discussed.
This workshop is about major challenges in the overall process of MWE treatment, both from the theoretical and the computational viewpoint, focusing on original research related to the following topics:Manually and automatically constructed resources Representation of MWEs in dictionaries and ontologies MWEs in linguistic theories like HPSG, LFG and minimalism MWEs and user interaction Multilingual acquisition Multilingualism and MWE processing Models of first and second language acquisition of MWEs Crosslinguistic studies on MWEs The role of MWEs in the domain adaptation of parsers Integration of MWEs into NLP applications Evaluation of MWE treatment techniques Lexical, syntactic or semantic aspects of MWEs
The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event is devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted their results for anaphora resolution task and three of them presented the results of coreference resolution task as well. The event was organized in order to estimate the state of the art for this NLP task in Russian and to compare various methods and principles implemented for Russian. We discuss the evaluation procedure. The anaphora and coreference tasks are specified in the present work. The phenomena taken into consideration are described. We also give a brief outlook of the similar evaluation events whose experience we lay upon. In our work we formulate the training and Gold Standard corpora construction guidelines and present the measures used in evaluation.
Combinatorial abilities are fundamental to experimental thinking. The aim of this work was to design didactic objects that will stimulate preschoolers’ experimental thinking and to study young children’s thinking in relation to these objects. Six heuristic rules for the design of didactic objects are specified, and the responses of 623 children aged between 3 and 7 to the didactic objects are described in this paper. The first two calculating devices required rods to be pressed simultaneously for successive windows to be lit up or made visible. A total of 30 five year olds played with these for 20 minutes, and were seen to perform a logical series of actions in order to understand the device’s function. Half of the children counted the presses and thereby understood the way the device functioned. The second device was designed to allow all possible combinations of four variables. Sixty children between the ages of 4 and 6 played with the device for 20 minutes. A total of 88% of the children found all possible combinations of the device, with no differences between age groups in the strategies used. The third device had a matrix of shutters opened by buttons arrayed along two edges. In the first mode, single buttons presses opened the nearest windows and button presses along both edges opened windows on coordinates determined by the two buttons. In the second mode, single button presses opened nothing and simultaneous button presses along two edges opened windows on coordinates determined by the two buttons. Ninety children between the ages of 5 and 10 played with the device in the second mode for 20 minutes. The children used scientific strategies to discover the device’s function in the following proportions: 20% at five years, 50% at six years and 93% at 10 years. Eighteen children between the ages of 4 and 6 played with the device in the second mode. They played in pairs, and each child was assigned a row of buttons, thus requiring co-operation to open the windows requiring two coordinated button presses. All the children were eventually successful in the joint experimentation. The fourth device had 16 windows and eight buttons, which lit up the windows when pressed in logical combinations. A total of 20 five-year-old children were trained on this device to use combinations of button presses to light up selected windows. These children were then allowed to explore the third device in second mode by themselves. The trained five year olds all used scientific strategies in their search for the third device’s combinations. The study showed that preschoolers can combine actions and discover hidden relationships, and that the didactic objects can be used to develop children’s thinking.