Features for Discourse-New Referent Detection in Russian
This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, like articles in English, for the overt marking a newly introduced referent. Our aim is to check the impact of various features on this task. The focus is on specific devices for introducing a new discourse prominent referent in Russian specified in theoretical studies. We conduct a pilot study of features impact and provide a series of experiments on detecting the first mention of a referent in a non-singleton coreference chain, drawing on linguistic insights about how a prominent entity introduced into discourse is affected by structural, morphological and lexical features.
The paper concerns discourse-new referent detection. The task of coreference resolution is essential in many text-mining applications. The focus in this task is to detect noun phrases (NPs) that refer to the same entity. In languages without articles, there are no overt grammatical clues in an NP for whether it introduces a new referent into discourse or it refers to one of before-mentioned entities. However, there are some theoretical researches which claim that referent first-mentioning NPs have some specific features. In our research, we examine features that serve as discourse-new detectors for NPs corresponding to discourse salient referents and provide an experiment on different features contribution to this detection. The first-mention detection could help the quality of coreference resolution systems.
The paper presents work on automatic Arabic dialect classification and proposes machine learning classification method where training dataset consists of two corpora. The first one is a small corpus of manually dialectannotated instances. The second one contains big amount of instances that were grabbed from the Web automatically using word-marks—most unique and frequent dialectal words used as dialect identifiers. In the paper we considered four dialects that are mostly used by Arabic people: Levantine, Egyptian, Saudi and Iraq. The most important benefit of that approach is the fact that it reduces time expenses on manual annotation of data from social media, because the accent is made on the corpus created automatically. Best results that we got were achieved with Naïve Bayes classifier trained using character-based bigrams, trigrams and word-marks vocabulary: precision of classification reaches 0.92 with F1 -measure equal to 0.91 on the test set of instances taken from manually annotated corpus.
The paper presents a short summary on the applications of the quantum logic categorical constructions to the natural language processing. We give a brief overview on the topic of quantum logic in general, and in natural language processing, in particular. As a result, we discuss comparison of sentences and their representation in quantum logic formalism. The examples of using quantum diagrams are considered in order to understand text analysis in terms of quantum logic techniques.
As the number of digital texts increases rapidly, there is a pressing need for more advanced and diverse tools of natural language processing. While purely statistical approaches proved powerful and efficient for many NLP tasks, there are many applications that would benefit from the formal models and approaches traditional language science has to offer. With hopes to facilitate this interaction between theory and practical implementation, we are pleased to announce the workshop on Computational Linguistics and Language Science to be held in Moscow, Russia on April 25, 2016 (11 AM to 6 PM).
This book constitutes the refereed proceedings of the 4th International Conference on Mining Intelligence and Knowledge Exploration, MIKE 2016, held in Mexico City, Mexico, in November 2016.
The 18 full papers presented were carefully reviewed and selected from 56 submissions.
Accepted papers were grouped into various subtopics including information retrieval, machine learning, pattern recognition, knowledge discovery, classification, clustering, image processing, network security, speech processing, natural language processing, language, cognition and computation, fuzzy sets, and business intelligence.
Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long and free social chats that occur naturally in social human interactions. In order to better understand the user’s expression, and then feedback the correct information, natural language understanding plays an extremely important role. Despite progress made in solving NLP problems, it remains very challenging today in the field of dialogue systems. In this paper, we review the recent progress in developing dialogue systems, its current architecture features and further prospects. We focus on the natural language understanding tasks which are key for building a good conversational agent, and than we are summarizing NLP methods and frameworks, in order to allow researchers to study the potential improvements of the state-of-the-art dialogue systems. Additionally, we consider the dialogue concept in context of human-machine interaction, and briefly describe dialogue evaluation metrics.
The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event is devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted their results for anaphora resolution task and three of them presented the results of coreference resolution task as well. The event was organized in order to estimate the state of the art for this NLP task in Russian and to compare various methods and principles implemented for Russian. We discuss the evaluation procedure. The anaphora and coreference tasks are specified in the present work. The phenomena taken into consideration are described. We also give a brief outlook of the similar evaluation events whose experience we lay upon. In our work we formulate the training and Gold Standard corpora construction guidelines and present the measures used in evaluation.
Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of papers reporting results on the MUC/ACE/OntoNotes corpora. Given the plethora of work on entity coreference and aware of other fora gathering coreferencerelated papers (such as LAW, DiscoMT or EVENTS), we believed that time was ripe for a new workshop on the single topic of coreference resolution that would bring together researchers who were interested in under-investigated coreference phenomena, willing to contribute both theoretical and applied computational work on coreference resolution, especially for languages other than English, less-researched forms of coreference and new applications of coreference resolution.