Evaluating Anaphora and Coreference Resolution for Russian
The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event is devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted their results for anaphora resolution task and three of them presented the results of coreference resolution task as well. The event was organized in order to estimate the state of the art for this NLP task in Russian and to compare various methods and principles implemented for Russian. We discuss the evaluation procedure. The anaphora and coreference tasks are specified in the present work. The phenomena taken into consideration are described. We also give a brief outlook of the similar evaluation events whose experience we lay upon. In our work we formulate the training and Gold Standard corpora construction guidelines and present the measures used in evaluation.
Proceeding of the 15th International Conference on Artificial Intelligence: Methodology, Systems, Applications , AIMSA 2012, Varna, Bulgaria, September 12-15, 2012.
This paper is an overview of the current issues and tendencies in Computational linguistics. The overview is based on the materials of the conference on computational linguistics COLING’2012. The modern approaches to the traditional NLP domains such as pos-tagging, syntactic parsing, machine translation are discussed. The highlights of automated information extraction, such as fact extraction, opinion mining are also in focus. The main tendency of modern technologies in Computational linguistics is to accumulate the higher level of linguistic analysis (discourse analysis, cognitive modeling) in the models and to combine machine learning technologies with the algorithmic methods on the basis of deep expert linguistic knowledge.
Compared with the area of spatial relations force interactions haven’t been in the limelight of attention of ontologists working on natural language processing. This article gives an example of text meaning representation based on the ontology and the lexicon of force interactions.
This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.
In this paper, we consider opinion word extraction, one of the key problems in sentiment analysis. Sentiment analysis (or opinion mining) is an important research area within computational linguistics. Opinion words, which form an opinion lexicon, describe the attitude of the author towards certain opinion targets, i.e., entities and their attributes on which opinions have been expressed. Hence, the availability of a representative opinion lexicon can facilitate the extraction of opinions from texts. For this reason, opinion word mining is one of the key issues in sentiment analysis. We designed and implemented several methods for extracting opinion words. We evaluated these approaches by testing how well the resulting opinion lexicons help improve the accuracy of methods for determining the polarity of the reviews if the extracted opinion words are used as features. We used several machine learning methods: SVM, Logistic Regression, Naive Bayes, and KNN. By using the extracted opinion words as features we were able to improve over the baselines in some cases. Our experiments showed that, although opinion words are useful for polarity detection, they are not su fficient on their own and should be used only in combination with other features.
The CCIS series is devoted to the publication of proceedings of computer science conferences. Its aim is to efficiently disseminate original research results in informatics in printed and electronic form. While the focus is on publication of peer-reviewed full papers presenting mature work, inclusion of reviewed short papers reporting on work in progress is welcome, too. Besides globally relevant meetings with internationally representative program committees guaranteeing a strict peer-reviewing and paper selection process, conferences run by societies or of high regional or national relevance are also considered for publication.
Concept discovery is a Knowledge Discovery in Databases (KDD) research field that uses human-centered techniques such as Formal Concept Analysis (FCA), Biclustering, Triclustering, Conceptual Graphs etc. for gaining insight into the underlying conceptual structure of the data. Traditional machine learning techniques are mainly focusing on structured data whereas most data available resides in unstructured, often textual, form. Compared to traditional data mining techniques, human-centered instruments actively engage the domain expert in the discovery process. This volume contains the contributions to CDUD 2011, the International Workshop on Concept Discovery in Unstructured Data (CDUD) held in Moscow. The main goal of this workshop was to provide a forum for researchers and developers of data mining instruments working on issues with analyzing unstructured data. We are proud that we could welcome 13 valuable contributions to this volume. The majority of the accepted papers described innovative research on data discovery in unstructured texts. Authors worked on issues such as transforming unstructured into structured information by amongst others extracting keywords and opinion words from texts with Natural Language Processing methods. Multiple authors who participated in the workshop used methods from the conceptual structures field including Formal Concept Analysis and Conceptual Graphs. Applications include but are not limited to text mining police reports, sociological definitions, movie reviews, etc.
The paper concerns discourse-new referent detection. The task of coreference resolution is essential in many text-mining applications. The focus in this task is to detect noun phrases (NPs) that refer to the same entity. In languages without articles, there are no overt grammatical clues in an NP for whether it introduces a new referent into discourse or it refers to one of before-mentioned entities. However, there are some theoretical researches which claim that referent first-mentioning NPs have some specific features. In our research, we examine features that serve as discourse-new detectors for NPs corresponding to discourse salient referents and provide an experiment on different features contribution to this detection. The first-mention detection could help the quality of coreference resolution systems.