Pre-experiments on Annotation of Russian Coreference Corpus

S. Toldova; I. Azerkovich; Гришина Ю.; Ладыгина А.; O. Lyashevskaya; Сим Г.; Васильева М.

?

Pre-experiments on Annotation of Russian Coreference Corpus

НИУ ВШЭ , 2015.

Toldova S., Azerkovich I., Гришина Ю., Ладыгина А., Lyashevskaya O., Ройтберг А. М., Сим Г., Васильева М.

Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subjects to annotation (markables), marking their syntactic heads and identifying coreferential links. Markables are classified according to their morphological, syntactic and reference structure. The annotation is performed manually, providing gold standard data for high-level NLP tasks such as anaphora and coreference resolution. The paper reports on inconsistencies in selecting NPs of various types as markables and their borders, and in ways of constructing anaphoric pairs. We consider the types of NPs missed by one of annotators and discourse and semantic factors that may have affected annotator's judgements.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

Full text

Keywords: русский язык Russian language кореференция анафора anaphora resolution corpus annotation аннотация корпуса inter-annotator agreement coreference corpus распознавание анафоры anaphora coreference корпус кореференции согласие между разметчиками

Inducing verb classes from frames in Russian: morpho-syntax and semantic roles

Кашкин Е. В., Компьютерная лингвистика и интеллектуальные технологии 2015 Vol. 21 P. 427-440

The paper presents clustering experiments on Russian verbs based on the statistical data drawn from the Russian FrameBank (framebank.ru). While lexicology has essentially abandoned the idea of syntactic transformations as the primary basis for grouping verbs into semantic classes (Apresjan 1967, Levin 1993), the hypothesis of the same lexical and syntactic distributional profiles underlying lexical ...

Added: September 30, 2015

Universal Dependencies for Russian: A New Syntactic Dependencies Tagset

Lyashevskaya O., Droganova K., Zeman D. et al., / НИУ ВШЭ. Series WP BRP "Linguistics". 2016. No. 44.

This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google. ...

Added: December 14, 2016

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)

M. : Russian State University for the Humanitie, 2019

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Anaphoric annotation and corpus-based anaphora resolution: An experiment

Alexeeva S. V., Protopopova E. V., Bodrova A. A. et al., Компьютерная лингвистика и интеллектуальные технологии 2014 P. 562-571

The paper describes the noun phase and anaphora annotation in OpenCorpora and compares it to that in other corpora. We discuss the choice of representative texts for anaphoric annotation and the basic principles of syntactic annotation. In case of noun phrase annotation we followed the scheme introduced earlier for morphological annotation: it was carried out ...

Added: October 8, 2014

Русское местоимение ТАМ как средство анафоры к актанту

Letuchiy A., Вопросы языкознания 2021 Т. 2021 № 4 С. 72-90

The present article addresses a special use of the Russian locative pronoun tam: the use represented in examples like ‘I talked to her — no understanding of the situation there’. In those contexts, tam refers to a participant (usually animate), rather than a location. After a brief sketch of other uses, I describe the rules and restrictions on the ...

Added: October 31, 2022

Квантитативные методы в диахронических корпусных исследованиях: конструкции с предикативами и дативным субъектом

Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80-95

The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...

Added: April 15, 2015

Интерпретация русских местоимений в контекстах контрфактического тождества: опыт корпусного исследования

Тискин Д. Б., В кн. : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.). Вып. 17(24).: М. : Издательский центр «Российский государственный гуманитарный университет», 2018. С. 735-746.

This paper is a first step towards a corpus-based description of the semantics of Russian pronouns in intensional contexts. Having justified the use of corpus in (formal) semantic research, I delineate a particular issue within the topic: whether a given pronoun is interpreted de se or de re in counteridentity contexts. A counteridentity context is a ...

Added: February 19, 2019

Welcome to the club: Designing the inventory of semantic roles for adjectives

Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440-454

The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...

Added: December 14, 2016

Coreference in Russian Oral Movie Retellings (the Experience of Coreference Relations Annotation in “Russian CliPS ” corpus)

Toldova S. Yu., Bergelson M. B., Khudyakova M. V., , in : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.). Вып. 15.: М. : Изд-во РГГУ, 2016. P. 769-781.

The work deals with adapting the Russian coreference corpus RuCor annotation system (used for written Russian) to the corpus of Russian oral narratives from the Russian Clinical Pear Stories Corpus (Russian CliPS) (Khudyakova et al., 2016). Russian CLiPS is a corpus of Russian “Pear stories” movie (Chafe, 1980) retellings in clinical populations as compared to ...

Added: June 6, 2016

Proceedings of the First Workshop on Computational Approaches to Discourse

Association for Computational Linguistics, 2020

Added: November 18, 2020

Evaluating Anaphora and Coreference Resolution for Russian

Toldova S.Ju., Roytberg A., Nedoluzhko А. et al., , in : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 4 — 8 июня 2014 г.). Вып. 13(20).: М. : Изд-во РГГУ, 2014. P. 681-695.

The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event is devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted their results for anaphora ...

Added: October 6, 2014

Anaphora Analysis based on ABBYY Compreno Linguistic Technologies

Skorinkin D., Старостин А., Богданов А. et al., , in : Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue 2014”. Issue 13 (20).: M. : ., 2014. P. 89-102.

This paper presents an anaphora analysis system that was an entry for the Dialog 2014 anaphora analysis competition. The system is based on ABBY Y Compreno linguistic technologies. For some of the tasks of this competition we used basic features of the Compreno technology, while others required building new rules and mechanisms or making adjustments ...

Added: November 28, 2015

Detecting ethnicity-targeted hate speech in Russian social media texts

Pronoza E., Panicheva P., Koltsova O. et al., Information Processing and Management 2021 Vol. 58 No. 6 Article 102674

Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user ...

Added: September 2, 2021

Quantitative approaches to the Russian language

Abingdon : Routledge, 2018

This edited collection presents a range of methods that can be used to analyse linguistic data quantitatively. A series of case studies of Russian data spanning different aspects of modern linguistics serve as the basis for a discussion of methodological and theoretical issues in linguistic data analysis. The book presents current trends in quantitative linguistics, ...

Added: October 11, 2016

Applying statistical tagging to Russian poetry

Starchenko A., Kazakevich L., Lyashevskaya O., / НИУ ВШЭ. Series WP BRP "Linguistics". 2018. No. 76.

The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic ...

Added: December 12, 2018

Towards to Automatic Text Adaptation in Russian Language

Karpov N., Sibirtseva V., / НИУ ВШЭ. Series WP BRP "Linguistics". 2014.

This article describes ways to use original texts in the National Russian Corpus as well as news texts for teaching Russian as a foreign language. Two-year work of a scientific group of Higher School of Economics (Nizhny Novgorod-Moscow), which is called CorpLings is analyzed. Special attention is paid to the basic principles of research part of the project ...

Added: December 10, 2014

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М. : Языки славянской культуры, 2016

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015

Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

Stroudsburg, PA : Association for Computational Linguistics, 2017

Added: November 6, 2017

Автоматическое определение частей речи для русского языка с помощью обучения трансформаций.

Kitov V. V., Научные труды Вольного экономического общества России 2014 Т. 186 С. 228-235

This paper describes the application of well-known «transformation-based learning» algorithm of automatic rule generation for the task of part-of-speech tagging. Algorithm is applied to corpora of annotated Russian texts and accuracy as well as most significant rules are shown. ...

Added: March 16, 2016

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.)

М. : Издательский центр «Российский государственный гуманитарный университет», 2019

Added: October 16, 2019

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.). Вып. 18 (25)

М. : Издательский центр «Российский государственный гуманитарный университет», 2019

Сборник включает 27 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2019», не вошедшие в ежегодник «Компьютерная лингвистика и интеллектуальные технологии», но рекомендованные Программным Комитетом к представлению на конференции. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: December 10, 2019

Formation of Control Structures in Static Swarms

Karpov V. E., Karpova I. P., Procedia Engineering 2015 Vol. 100 P. 1459-1468

Work solutions are proposed for problems of leader definition and role distribution in homogeneous groups of robots. It is shown that transition from a swarm to a collective of robots with hierarchical organization is possible using exclusively local interaction. The local revoting algorithm is central to the procedure for choice of leader while redistribution of roles can ...

Added: March 14, 2015

Particle Simulation for Predicting Effective Properties of Short Fiber Reinforced Composites

Skoptsov K. A., Sheshenin S., Galatenko V. V. et al., International Journal of Applied Mechanics 2016 Vol. 8 No. 2 P. 1650016-01-1650016-18

We present a method for evaluating elastic properties of a composite material produced by molding a resin filled with short elastic fibers. A flow of the filled resin is simulated numerically using a mesh-free method. After that, assuming that spatial distribution and orientation of fibers are not significantly changed during polymerization, effective elastic moduli of ...

Added: May 22, 2016

Программный комплекс моделирования физических процессов при автоматизированном проектировании источников вторичного электропитания для сложных бортовых систем

Sotnikova S., Динамика сложных систем 2012 № 3 С. 84-87

In article is described designed programme complex of the physical processes modeling, which also allows to conduct the identification printed node parameters (the physical model). On printed node designed the on-board secondary power supply source is realized. For it are designed relationship interfaces of controlling program with the known program of modeling and optimization. ...

Added: December 5, 2014