THE ROLE AND APPLICATIONS OF EXPERT ERROR ANNOTATION IN A CORPUS OF ENGLISH LEARNER TEXTS
The paper presents the rationale for the decisions that were taken in the set-up and further development of a learner corpus of student texts written in English by Russian learners of English, the only Russian learner corpus in the open access. The tool of manual expert annotation is in the focus of the present observations, and after introducing categorization of errors applied in annotation, the complicated cases that arose in annotation practices have been looked into followed by comparison of the annotation statistics over the three stages in the corpus development. For that purpose, texts annotated by different groups of participants in the process of two experiments were used to spot the problematic areas in annotation. The main pedagogical applications of the learner corpus in teaching EFL – the opportunities to create automated training exercises and placement and progress tests custom-made for specific groups of students - are outlined in the concluding part of the paper.
The paper describes the structure and possible applications of the theory of K-representations (knowledge representations) in bioinformatics and in the development of a Semantic Web of a new generation. It is an original theory of designing semantic-syntactic analyzers of natural language (NL) texts with the broad use of formal means for representing input, intermediary, and output data. The current version of the theory is set forth in a monograph by V. Fomichov (Springer, 2010). The first part of the theory is a formal model describing a system consisting of ten operations on conceptual structures. This model defines a new class of formal languages – the class of SK-languages. The broad possibilities of constructing semantic representations of complex discourses pertaining to biology are shown. A new formal approach to developing multilingual algorithms of semantic-syntactic analysis of NL-texts is outlined. This approach is realized by means of a program in the language PYTHON.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
The book contains the proceedigs of the 18th International Conference on Automatic Processing of Natural Langage (France, Montpellie, 27th June - 1st July 2011).
The project we present – Russian Learner Translator Corpus (RusLTC) is a multiple learner translator corpus which stores Russian students’ translations out of English and into it. The project is being developed by a cross-functional team of translator trainers and computational linguists in Russia. Translations are collected from several Russian universities; all translations are made as part of routine and exam assignments or as submissions for translation contests by students majoring in translation. As of March 2014 RusLTC contains the total of nearly 1.2 million word tokens, 258 source texts, and 1,795 translations. The paper gives a brief overview of the related research, describes the corpus structure and corpus-building technologies used; it also covers the query tool features and our error annotation solutions. In the final part we make a summary of the RusLTC-based research, its current practical applications and suggest research prospects and possibilities.
A framework for fast text analysis, which is developed as a part of the Texterra project, is described. Texterra provides a scalable solution for the fast text processing on the basis of novel methods that exploit knowledge extracted from the Web and text documents. For the developed tools, details of the project, use cases, and evaluation results are presented.
This workshop is about major challenges in the overall process of MWE treatment, both from the theoretical and the computational viewpoint, focusing on original research related to the following topics:Manually and automatically constructed resources Representation of MWEs in dictionaries and ontologies MWEs in linguistic theories like HPSG, LFG and minimalism MWEs and user interaction Multilingual acquisition Multilingualism and MWE processing Models of first and second language acquisition of MWEs Crosslinguistic studies on MWEs The role of MWEs in the domain adaptation of parsers Integration of MWEs into NLP applications Evaluation of MWE treatment techniques Lexical, syntactic or semantic aspects of MWEs
The present paper is a comparative corpus study of the verbal expression of emotional etiquette in American English and Russian. The study is conducted against the backdrop of certain assumptions regarding the cross-cultural centrality and marginality of emotions as formulated in the current research on cross-cultural pragmatics. The paper employs corpus-based methods to test the frequencies of the linguistic expression of different types of emotions in Russian and American English as encountered in diagnostic contexts of first-person reporting. Contrary to many currently-accepted theories, the present study demonstrates no absolute prevalence of positive or ethical over negative or non-ethical emotions in Russian or American English. It also disproves certain more specific claims (the predominance of ‘pity’ in Russian), while confirming others (prominence of ‘shame’ in Russian). Certain tendencies in emotional etiquette lean toward cross-cultural universality (e.g., ‘gratitude’ as the most frequently expressed emotion), while others differ. Overall, Russian speakers tend to report more passive negative emotions (‘fear’), while English speakers prefer reporting active negative emotions (‘anger’). Russian speakers are more “self-deprecating” than English speakers, as they favor expressing ‘shame’ over ‘pride’. At the same time, they show less empathy with the addressee, reporting more ‘contempt’-like and less ‘pity’-like emotions. The results obtained in this study can be useful for understanding and formulating culturally-specific pragmatic peculiarities and hence preferred conversational strategies in the two languages.
The choice of an appropriate referential expression (definite description, proper name or pronoun) depends on multiple factors. This paper focuses on how the possessor position of a referential expression and its antecedent affect referential choice. Other factors, such as syntactical role, form and definiteness of the antecedent, and animacy of the referent are considered. The study is based on a subcorpus of the specially designed RefRhet corpus.