Identification of Singleton Mentions in Russian

S. Toldova; Max Ionov

?

Identification of Singleton Mentions in Russian

Ch. 5. P. 33–41.

This paper describes a pilot study of the problem of detecting singleton mentions in Russian texts. A noun phrase is considered a singleton mention if it is the only referent of some entity. We discuss various morphosyntactic and lexical features, some of which were used for analogous tasks for English and propose new features derived from the discourse analysis. Testing the machine learning classifiers trained with the use of proposed features, we conclude that although the quality of classifiers is significantly lower than for English, they still have rather high precision and thus can be helpful in various tasks of mention tracking.

Language: English

Text on another site

Keywords: coreference resolution singleton mention

In book

CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016

Vol. 1886. , Aachen: CEUR Workshop Proceedings, 2017.

Proceedings of the First Workshop on Computational Approaches to Discourse

Association for Computational Linguistics, 2020.

Added: November 18, 2020

Evaluation for morphologically rich language: Russian NLP

Toldova S., Lyashevskaya O., Bonch-Osmolovskaya A. A. et al., , in: Proceedings on the International Conference on Artificial Intelligence (ICAI)Vol. 1. Las Vegas: CSREA Press, 2015. P. 300–306.

Abstract - RU-EVAL is a biennial event organized in order to estimate the state of the art in Russian NLP resources, methods and toolkits and to compare various methods and principles implemented for Russian. Russian could be treated as an under-resourced language due to the lack of free distributable gold standard corpora for different NLP ...

Added: December 9, 2015

Error analysis for anaphora resolution in Russian: new challenging issues for anaphora resolution task in a morphologically rich language

Anna Roytberg, Toldova S., Alina Ladygina et al., , in: Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), co-located with NAACL 2016, San Diego, California, June 16, 2016. Stroudsburg, PA: Association for Computational Linguistics, 2016. P. 74–83.

This paper presents a quantitative and qualitative error analysis of Russian anaphora resolvers which participated in the RU-EVAL event. Its aim is to identify and characterize a set of challenging errors common to stateof-the-art systems dealing with Russian. We examined three types of pronouns: 3rd person pronouns, reflexive and relative pronouns. The investigation has shown ...

Added: December 7, 2016

Mention Detection for Improving Coreference Resolution in Russian Texts: A Machine Learning Approach

Toldova S., Ionov M., Computacion y Sistemas 2016 Vol. 20 No. 4 P. 681–696

The paper concerns discourse-new referent detection. The task of coreference resolution is essential in many text-mining applications. The focus in this task is to detect noun phrases (NPs) that refer to the same entity. In languages without articles, there are no overt grammatical clues in an NP for whether it introduces a new referent into ...

Added: December 27, 2016

Russian Coreference Corpus

Toldova S., Ladygina A., Azerkovich I. et al., , in: Input a Word, Analyze the World. Newcastle upon Tyne: Cambridge Scholars Publishing, 2016. Ch. 7 P. 107–125.

The Russian coreference corpus was established in 2013/2014 and annotated with coreference and anaphoric relations. At present, the corpus consists of 185 texts of different genres: newswire articles and fiction texts. The texts were taken from Russian freely available online resources and manually annotated for anaphora and coreference. Our corpus was annotated taking into account language-specific ...

Added: October 15, 2016

Employing Wikipedia data for coreference resolution in Russian

Azerkovich I., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930. Switzerland: Springer, 2018. P. 107–112.

Semantic information has been deemed a valuable resource for solving the task of coreference resolution by many researchers. Unfortunately, not much has been done in the direction of using this data when working with Russian data. This work describes the first step of a research, attempting to create a coreference resolution system for Russian based on semantic data, concerned with ...

Added: September 5, 2018

Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), co-located with NAACL 2016, San Diego, California, June 16, 2016

Stroudsburg, PA: Association for Computational Linguistics, 2016.

Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of papers reporting results on the MUC/ACE/OntoNotes corpora. Given ...

Added: December 7, 2016

Coreference resolution for Russian: the impact of semantic features

Toldova S., Maxim Ionov, , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23). M.: -, 2017. P. 339–348.

This paper presents the results of our experiments on building a general coreference resolution system for Russian. The main aim of those experiments was to set a baseline for this task for Russian using the standard set of features developed and tested for coreference resolution systems created for other languages. We propose several baseline systems, ...

Added: July 12, 2017

Evaluating Anaphora and Coreference Resolution for Russian

Toldova S.Ju., Roytberg A., Nedoluzhko А. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 4 — 8 июня 2014 г.)Вып. 13(20). М.: Изд-во РГГУ, 2014. P. 681–695.

The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event is devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted their results for anaphora ...

Added: October 6, 2014

Anaphora Analysis based on ABBYY Compreno Linguistic Technologies

Skorinkin D., Старостин А., Богданов А. et al., , in: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue 2014”Issue 13 (20). M.: ., 2014. P. 89–102.

This paper presents an anaphora analysis system that was an entry for the Dialog 2014 anaphora analysis competition. The system is based on ABBY Y Compreno linguistic technologies. For some of the tasks of this competition we used basic features of the Compreno technology, while others required building new rules and mechanisms or making adjustments ...

Added: November 28, 2015

Features for Discourse-New Referent Detection in Russian

Toldova S., Ionov M., , in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016Vol. 1. Issue 9623. Springer Publishing Company, 2018. P. 648–662.

This paper concerns discourse-new mention detection in Russian. This might be helpful for different NLP applications such as coreference resolution, protagonist identification, summarization and different tasks of information extraction to detect the mention of an entity newly introduced into discourse. In our work, we are dealing with the Russian where there is no grammatical devices, ...

Added: September 1, 2018

Coreference Chains in Czech, English and Russian: Preliminary Findings.

Toldova S., Nedoluzhko А., Novák M., , in: Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015). М.: Изд-во РГГУ, 2015. P. 474–486.

This paper is a pilot comparative study on coreference chaining in three languages, namely, Czech, English and Russian. We have analyzed 16 parallel English-Czech newspaper texts and 16 texts in Russian (similar to the English-Czech ones in length and topics). Our motivation was to find out what the linguistic structure of coreference chains in different ...

Added: December 9, 2015