Anaphoric annotation and corpus-based anaphora resolution: An experiment

Alexeeva S. V.; Protopopova E. V.; Bodrova A. A.; Volskaya S. A.; Krylova I. V.; Chuchunkov A. S.; Granovsky D. V.; Bocharov V. V.

?

Anaphoric annotation and corpus-based anaphora resolution: An experiment

Компьютерная лингвистика и интеллектуальные технологии. 2014. P. 562–571.

Alexeeva S. V., Protopopova E. V., Bodrova A. A., Volskaya S. A., Krylova I. V., Chuchunkov A. S., Granovsky D. V., Bocharov V. V.

The paper describes the noun phase and anaphora annotation in OpenCorpora and compares it to that in other corpora. We discuss the choice of representative texts for anaphoric annotation and the basic principles of syntactic annotation. In case of noun phrase annotation we followed the scheme introduced earlier for morphological annotation: it was carried out in two stages: firstly, all noun phrases and some other syntactic units were annotated by a heterogenous group of people, then a linguist compared all markup results and found the best one, or corrected mistakes. We present some annotation results and cases of annotator's disagreement and proceed to introduce our data-driven anaphora resolution system based on decision trees. We then list the features used to fit the classificator and discuss their relevance and some changes which improved the classificator performance. We also present out rule-based approach to automated noun phrase extraction using Tomita parser. A baseline for anaphora resolution is introduced and we compare it with our results.

Research target: Computer Science Philology and Linguistics

Priority areas: humanitarian IT and mathematics

Language: English

Text on another site

Keywords: crowdsourcing краудсорсинг корпус анафора corpora linguistics anaphora resolution Syntactic annotation синтаксическая разметка

Национальный корпус русского языка как основа новаторских электронных учебников

Sibirtseva V., Khomenko A., Baranova J., Образовательные технологии и общество 2013 Т. 16 № 3 С. 508–521

The article reports about the students and teachers research group of National Research University Higher School of Economics entitled "Corplingui (Nizhny Novgorod-Moscow)"development. This work is about the research in the field of computer and corpus linguistics. Development primarily focuses on the creation of interactive resources based on the materials of The Russian National Corpus. The ...

Added: October 4, 2013

Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science

Springer, 2015.

16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I ISBN: 978-3-319-18110-3 (Print) 978-3-319-18111-0 (Online) ...

Added: April 23, 2015

Pre-experiments on Annotation of Russian Coreference Corpus

Toldova S., Azerkovich I., Гришина Ю. et al., / NRU HSE. Series WP BRP "Linguistics". 2015.

Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subjects to annotation (markables), marking their ...

Added: December 15, 2015

Dark personalities on Facebook: Harmful online behaviors and language

Bogolyubova O., Panicheva P., Tikhonov R. et al., Computers in Human Behavior 2018 Vol. 78 P. 151–159

*Реализация соц. сети Facebook запрещена на территории России по основаниям осуществления экстремистской деятельности. The goal of this paper was to assess the connection between dark personality traits and engagement in harmful online behaviors in a sample of Russian Facebook users, and to describe the language they use in online communication. A total of 6724 individuals participated ...

Added: February 18, 2019

Speech and Computer. 21st International Conference, SPECOM 2019, Istanbul, Turkey, August 20–25, 2019, Proceedings

Switzerland: Springer, 2019.

This volume contains a collection of submitted papers presented at the conference, which were thoroughly reviewed by members of the Program Committee consisting of more than 100 top specialists, as well as an invited paper by Prof. Scharenborg. Each paper was reviewed, single blind, by two to four committee members (three reviewers on the average) and then discussed by ...

Added: October 29, 2019

Computational Linguistics and Intellectual Technologies

M.: Russian State University for the Humanitie, 2019.

The book includes 61 reports of the International conference on computer and intellectual technology "Dialogue-2019", representing a wide range of theoretical and applied research in the field of natural language description, modeling of language processes, creating practically applicable computer linguistic technologies. For specialists in the field of theoretical and applied linguistics and intellectual technologies. ...

Added: June 12, 2019

Fake opinion detection: how similar are crowdsourced datasets to real data?

Fornaciari T., Cagnina L., Россо П. et al., Language Resources and Evaluation 2020 Vol. 54 No. 4 P. 1019–1058

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models ...

Added: October 29, 2020

Digital Russia: The Language, Culture and Politics of New Media Communication

L.: Routledge, 2014.

This book provides a comprehensive analysis of the ways in which new media technologies have shaped language and communication in contemporary Russia. It traces the development of the Russian-language internet (Runet) from late-Soviet cybernetics to the advent of Twitter and explores the evolution of web-based communication practices, showing how they have both shaped and been ...

Added: December 11, 2013

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)

М.: Изд-во РГГУ, 2020.

Papers from the Annual International Conference “Dialogue” (2020). Issue 19 ...

Added: June 26, 2020

RUSSE2018: a Shared Task on Word Sense Induction for the Russian Language

Panchenko A., Lopukhina A., Ustalov D. et al., Компьютерная лингвистика и интеллектуальные технологии 2018 No. 17 P. 547–564

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic ...

Added: June 7, 2018

Машина в духе. Гуссерлевское обоснование и ограничение искусственного интеллекта.

Холенштайн Э., Логос 2007 Т. 63 № 6 С. 176–196

В статье предпринимается попытка применить феноменологическуюй психологию Эдмунда Гуссерля к разрешению проблемы соотношения естественного и искусственного интеллекта. ...

Added: July 7, 2015

Количественная оценка грамматической неоднозначности некоторых европейских языков

Klyshinskiy E., Логачёва В. К., Карпик О. В. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 1 С. 5–21

The grammatical ambiguity (multiple sets of grammatical features for one word form or coinciding surface forms of different words) can be of different types. We describe six classes of grammatical ambiguity: unambiguous, ambiguous by grammatical features, by part of speech, by lemma, by lemma and part of speech, and out-of-vocabulary words. These classes are presented ...

Added: December 11, 2019

Труды международной конференции "Корпусная лингвистика - 2019"

СПб.: Издательство Санкт-Петербургского университета, 2019.

Сборние содержит материалы докладов, представленных на Международной научной конференции "Корпусная лингвистика-2019" 24-28 июня 2019 г. в Санкт-Петербурге. ...

Added: July 8, 2019

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019.

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Proceedings of the Eleventh International Conference on Computational Creativity

Coimbra: Association for Computational Creativity, 2020.

Added: September 29, 2020

Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16)

Association for Computational Linguistics, 2017.

The volume includes papers presented at the 16th International Workshop on Treebanks and Linguistic Theories (TLT), which brings together developers and users of linguistically annotated natural language corpora. As ‘treebanks’ we consider any pairing of natural language data (spoken or written) with annotations of linguistic structure at various levels of analysis, ranging from e.g. morpho-phonology ...

Added: December 11, 2018

The 26th International Conference on Computational Linguistics (COLING 2016)

[б.и.], 2016.

Added: December 1, 2016

Innovative Use of NLP for Building Educational Applications

Stroudsburg, PA: Association for Computational Linguistics, 2019.

Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications ...

Added: October 5, 2020

Technological and Social Environments for Interactive Learning

Informing Science Press, 2013.

Technology Enhanced Learning (TEL) is a very broad and increasingly mature research field. It encompasses a wide variety of research topics, ranging from the study of different pedagogical approaches and teaching/learning strategies and techniques, to the application of advanced technologies in educational settings such as the use of different kinds of mobile devices, sensors and ...

Added: February 20, 2013

Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts

Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69–89

The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...

Added: June 24, 2021

Exploring the Effectiveness of Methods for Persona Extraction

Konstantin Zaitsev, / Series Computer Science "arxiv.org". 2024.

The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian. To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models, resulting in improved data quality. A metric based on the F-score concept is presented to evaluate the effectiveness ...

Added: September 26, 2024

Insights into the web based English learning projects

Frolova N., Frolov E. S., The Kazakh-American Free University Academic Journal 2017 P. 179–184

The article reflects the practical experience of enhancing the process of Academic English Writing teaching to undergraduate students by means of web tools. Along with theoretical analysis of the integration scheme of blended learning into the curriculum the article features empirical survey to confirm the efficiency of the project. The article contains a detailed description ...

Added: June 5, 2018

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics

Sofia: Springer, 2013.

Welcome to the 2013 Conference of the Association for Computational Linguistics! Our community continues to grow, and this year’s conference has set a new record for paper submissions. We received 1286 submissions, which is 12% more than the previous record; we are particularly pleased to see a striking increase in the number of short papers ...

Added: October 1, 2014

Онтологические модели ситуаций в задачах компьютерного контроля знаний иностранного языка

Demkin V. M., Sosnin A., Сусманова С. С., Онтология проектирования 2014 № 3(13) С. 63–76

Discussed in the paper are modern approaches to the design of complicated intellectual computer systems assessing foreign language proficiency, e.g. checking students’ academic progress in a higher educational establishment. The paper provides insight into the means to develop ontology-based situation models in the tasks requiring that a person’s command of English be assessed, which is ...

Added: October 24, 2012