MULTI-LEVEL STUDENT ESSAY FEEDBACK IN A LEARNER CORPUS
The paper presents the results of using some computer tools and applications for the purposes of the automated and semi-automated syntactical, lexica, and error analysis of student essays in a learner corpus. The texts in the corpus were written in English by Russian learners of English. The experiment in the research consisted in comparing the parameters of different types and at different levels in the essays graded by professional examiners as the best and those graded the lowest in the pool of about 2000 essays. At the first stage in the experiment the authors applied a syntactical tool for parsing the sentences, then analyzed the results of lexical observations in those texts, and finally collected the statistics related to the errors pointed out in manual expert annotation. The parameters that had very different values for the “good” and for the “bad” essays are regarded by the authors as worthy parts of the feedback a student has to get for the text uploaded into the learner corpus.
This paper presents an algorithm that allows the user to issue a query pattern, collects multi-word expressions (MWEs) that match the pattern, and then ranks them in a uniform fashion. This is achieved by quantifying the strength of all possible relations between the tokens and their features in the MWEs. The algorithm collects the frequency of morphological categories of the given pattern on a unified scale in order to choose the stable categories and their values. For every part of speech, and for all of its categories, we calculate a normalized Kullback-Leibler divergence between the category’s distribution in the pattern and its distribution in the corpus overall. Categories with the largest divergence are considered to be the most significant. The particular values of the categories are sorted according to a frequency ratio. As a result, we obtain morphosyntactic profiles of a given pattern, which includes the most stable category of the pattern, and their values.
The paper describes the structure and possible applications of the theory of K-representations (knowledge representations) in bioinformatics and in the development of a Semantic Web of a new generation. It is an original theory of designing semantic-syntactic analyzers of natural language (NL) texts with the broad use of formal means for representing input, intermediary, and output data. The current version of the theory is set forth in a monograph by V. Fomichov (Springer, 2010). The first part of the theory is a formal model describing a system consisting of ten operations on conceptual structures. This model defines a new class of formal languages – the class of SK-languages. The broad possibilities of constructing semantic representations of complex discourses pertaining to biology are shown. A new formal approach to developing multilingual algorithms of semantic-syntactic analysis of NL-texts is outlined. This approach is realized by means of a program in the language PYTHON.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The book contains the proceedigs of the 18th International Conference on Automatic Processing of Natural Langage (France, Montpellie, 27th June - 1st July 2011).
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.
This workshop is about major challenges in the overall process of MWE treatment, both from the theoretical and the computational viewpoint, focusing on original research related to the following topics:Manually and automatically constructed resources Representation of MWEs in dictionaries and ontologies MWEs in linguistic theories like HPSG, LFG and minimalism MWEs and user interaction Multilingual acquisition Multilingualism and MWE processing Models of first and second language acquisition of MWEs Crosslinguistic studies on MWEs The role of MWEs in the domain adaptation of parsers Integration of MWEs into NLP applications Evaluation of MWE treatment techniques Lexical, syntactic or semantic aspects of MWEs
The present paper is a comparative corpus study of the verbal expression of emotional etiquette in American English and Russian. The study is conducted against the backdrop of certain assumptions regarding the cross-cultural centrality and marginality of emotions as formulated in the current research on cross-cultural pragmatics. The paper employs corpus-based methods to test the frequencies of the linguistic expression of different types of emotions in Russian and American English as encountered in diagnostic contexts of first-person reporting. Contrary to many currently-accepted theories, the present study demonstrates no absolute prevalence of positive or ethical over negative or non-ethical emotions in Russian or American English. It also disproves certain more specific claims (the predominance of ‘pity’ in Russian), while confirming others (prominence of ‘shame’ in Russian). Certain tendencies in emotional etiquette lean toward cross-cultural universality (e.g., ‘gratitude’ as the most frequently expressed emotion), while others differ. Overall, Russian speakers tend to report more passive negative emotions (‘fear’), while English speakers prefer reporting active negative emotions (‘anger’). Russian speakers are more “self-deprecating” than English speakers, as they favor expressing ‘shame’ over ‘pride’. At the same time, they show less empathy with the addressee, reporting more ‘contempt’-like and less ‘pity’-like emotions. The results obtained in this study can be useful for understanding and formulating culturally-specific pragmatic peculiarities and hence preferred conversational strategies in the two languages.
The choice of an appropriate referential expression (definite description, proper name or pronoun) depends on multiple factors. This paper focuses on how the possessor position of a referential expression and its antecedent affect referential choice. Other factors, such as syntactical role, form and definiteness of the antecedent, and animacy of the referent are considered. The study is based on a subcorpus of the specially designed RefRhet corpus.