Coreference Chains in Czech, English and Russian: Preliminary Findings.

S. Toldova; Nedoluzhko А.; Novák M.

?

Coreference Chains in Czech, English and Russian: Preliminary Findings.

P. 474–486.

Toldova S., Nedoluzhko А., Novák M.

This paper is a pilot comparative study on coreference chaining in three languages, namely, Czech, English and Russian. We have analyzed 16 parallel English-Czech newspaper texts and 16 texts in Russian (similar to the English-Czech ones in length and topics). Our motivation was to find out what the linguistic structure of coreference chains in different languages is and what types of distinctions we should take into account for advancing the development of systems for coreference resolution. Taking into account theoretical approaches to the phenomenon of coreference we based our research on the following assumption: the recognition of coreference links for different structural types of noun phrases is regulated by different language mechanisms. The other starting point was that different languages allow pronominal chaining of different length and that coreference chains properties differ for the languages with different strategies for zero anaphora and different systems for definiteness marking. This work reports our first findings within the task of the structural NP types’ distribution comparison in three languages under analysis.

Language: English

Full text

Text on another site

Keywords: coreference resolution anaphora coreference cross-linguistic comparison

In book

Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015)

М.: Изд-во РГГУ, 2015.

Pre-experiments on Annotation of Russian Coreference Corpus

Toldova S., Azerkovich I., Гришина Ю. et al., / NRU HSE. Series WP BRP "Linguistics". 2015.

Building benchmark corpora in the domain of coreference and anaphora resolution is an important task for developing and evaluating NLP systems and models. Our study is aimed at assessing the feasibility of enhancing corpora with information about coreference relations. The annotation procedure includes identification of text segments that are subjects to annotation (markables), marking their ...

Added: December 15, 2015

Licensing Reflexivity: Unity and variation among selected Uralic languages

Volkova A. A., Utrecht: LOT, 2014.

This dissertation analyzes the reflexivity patterns in Uralic languages from the point of view of a minimalist approach to binding. The languages under consideration are five Uralic languages spoken in the Russian Federation: Meadow Mari, Komi-Zyrian, Khanty, Besermyan Udmurt, and Erzya. The empirical data were compiled during fieldwork, and are used to test and assess ...

Added: October 26, 2014

Referential coherence of academic texts: a corpus-based analysis of L2 research papers in management

Elizaveta Smirnova, Language Teaching Research 2019

This paper focuses on referential coherence which is seen as a crucial attribute of effective academic writing. I report findings from a corpus study of Russian students' use of anaphoric expressions in their research proposals which is compared to a reference corpus comprising research articles published in peer-reviewed journals. I hypothesise that learners use anaphora ...

Added: December 5, 2017

Kendisi revisited

Rudnev P., , in: Donum semanticum: Opera linguistica et logica in honorem Barbarae Partee a discipulis amicisque Rossicis oblata.: M.: Languages of Slavic culture, 2015. P. 263–271.

Added: September 12, 2017

Russian Coreference Corpus

Toldova S., Ladygina A., Azerkovich I. et al., , in: Input a Word, Analyze the World.: Newcastle upon Tyne: Cambridge Scholars Publishing, 2016. Ch. 7 P. 107–125.

The Russian coreference corpus was established in 2013/2014 and annotated with coreference and anaphoric relations. At present, the corpus consists of 185 texts of different genres: newswire articles and fiction texts. The texts were taken from Russian freely available online resources and manually annotated for anaphora and coreference. Our corpus was annotated taking into account language-specific ...

Added: October 15, 2016

Employing Wikipedia data for coreference resolution in Russian

Azerkovich I., , in: Artificial Intelligence and Natural Language, 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, ProceedingsIssue 930.: Switzerland: Springer, 2018. P. 107–112.

Semantic information has been deemed a valuable resource for solving the task of coreference resolution by many researchers. Unfortunately, not much has been done in the direction of using this data when working with Russian data. This work describes the first step of a research, attempting to create a coreference resolution system for Russian based on semantic data, concerned with ...

Added: September 5, 2018

Referential Coherence of Academic Texts: A Corpus-Based Analysis of L2 Research Papers in Management

Smirnova E. A., Journal of Language and Education 2019 Vol. 5 No. 4 P. 112–127

This paper focuses on referential coherence, which is seen as a crucial attribute of effective academic writing. Findings are reported from a corpus study of Russian students’ research proposals. The learners’ use of anaphoric expressions is compared with a reference corpus, which comprises research articles published in peer-reviewed journals. It was hypothesised that learners use ...

Added: December 21, 2019

The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies

Rzymski C., Tresoldi T., Greenhill S. et al., Scientific data 2020 Vol. 7 P. 13

Advances in computer-assisted linguistic research are greatly influencing and reshaping linguistic investigation. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. In this work we present CLICS, ...

Added: November 1, 2019

Syntactic structure of participial clauses in Meadow Mari

Volkova A. A., / Series WP BRP "Basic research program". 2017. No. 54.

It is generally assumed that the syntactic structure of participial relative clauses is impoverished, “reduced” in comparison to that of regular RCs (see a. o. Burzio 1981; Chomsky 1981; Hazout 2001; Siloni 1995; Stowell 1981). Participial RCs are often analysed as VP-like structures (for some, embedded under a nominalizing node, Doron & Reintges 2005; Hazout ...

Added: April 15, 2017

Reflexivity in Meadow Mari: Binding and Agree

Volkova A. A., Studia Linguistica 2017 Vol. 71 No. 1-2 P. 178–204

According to the Canonical Binding Theory (Chomsky 1981), anaphors must be bound in their local domain and pronominals must be free. The discovery of “long-distance anaphors” (e.g. Thrainsson 1976, Giorgi 1984), which violate the locality condition, induced the search for independent criteria. Giorgi (1984:310) proposed a widely adopted criterion: “pronouns can have split antecedents and ...

Added: April 2, 2017

Указательная анафора в мультимодальной коммуникации

Николаева Ю. В., Евдокимова А. А., Budennaya E., В кн.: Компьютерная лингвистика и интеллектуальные технологииВып. 20 (27): Дополнительный том.: Изд-во РГГУ, 2021. С. 1130–1143.

Статья посвящена взаимодействию анафорических указательных выражений с параллельными жестами рук и головы на материале мультимодального корпуса RUPEX. Анализ выявил ряд корреляций между ролью говорящего (Рассказчик / Пересказчик / Комментатор) и его невербальным поведением. Было обнаружено, что Пересказчик, не видевший фильма, чаще прибегал к указательной анафоре, по сравнению с Рассказчиком. По-видимому, данными действиями Пересказчик стремился воссоздать ...

Added: August 11, 2021

Evaluation for morphologically rich language: Russian NLP

Toldova S., Lyashevskaya O., Bonch-Osmolovskaya A. A. et al., , in: Proceedings on the International Conference on Artificial Intelligence (ICAI)Vol. 1.: Las Vegas: CSREA Press, 2015. P. 300–306.

Abstract - RU-EVAL is a biennial event organized in order to estimate the state of the art in Russian NLP resources, methods and toolkits and to compare various methods and principles implemented for Russian. Russian could be treated as an under-resourced language due to the lack of free distributable gold standard corpora for different NLP ...

Added: December 9, 2015

Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016), co-located with NAACL 2016, San Diego, California, June 16, 2016

Stroudsburg, PA: Association for Computational Linguistics, 2016.

Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of papers reporting results on the MUC/ACE/OntoNotes corpora. Given ...

Added: December 7, 2016

Referential coherence of academic texts: A corpus-based analysis of L2 research papers in management

Smirnova E. A., Journal of Language and Education 2019

This paper focuses on referential coherence which is seen as a crucial attribute of effective academic writing. I report findings from a corpus study of Russian students’ research proposals. The learners’ use of anaphoric expressions is compared with that in a reference corpus which comprises research articles published in peer-reviewed journals. I hypothesise that learners ...

Added: October 20, 2019

Плеонастические причастия в современной русской речи: функции и тенденции развития

Ю. М. Кувшинская, Н. А. Зевахина, Acta Linguistica Petropolitana. Труды института лингвистических исследований 2023 Т. 19 № 1 С. 138–192

The paper studies tendencies in the use of full single (i.e. without their arguments) redundant participles in the attributive position in the Russian written discourse. Relying upon the data of the Russian National Corpus and the Corpus of Russian Student Texts, as well as a number of the examples collected from various written sources, the ...

Added: December 8, 2022

Non-canonical control in Russian converbial clauses

Жукова С. Ю., Zevakhina N., Slioussar N. et al., Russian linguistics 2020 Vol. 44 No. 2 P. 129–143

It has been acknowledged that the null subject of a converbial clause in Russian is canonically controlled by the Nominative subject of a main clause (Nominative subject control). Non-Nominative control has been considered ungrammatical. On the basis of two experiments (acceptability rating and speeded grammaticality judgement tasks) the paper shows that the non-Nominative control (by ...

Added: April 13, 2020

Coreference in Russian Oral Movie Retellings (the Experience of Coreference Relations Annotation in “Russian CliPS ” corpus)

Toldova S. Yu., Bergelson M. B., Khudyakova M. V., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.)Вып. 15.: М.: Изд-во РГГУ, 2016. P. 769–781.

The work deals with adapting the Russian coreference corpus RuCor annotation system (used for written Russian) to the corpus of Russian oral narratives from the Russian Clinical Pear Stories Corpus (Russian CliPS) (Khudyakova et al., 2016). Russian CLiPS is a corpus of Russian “Pear stories” movie (Chafe, 1980) retellings in clinical populations as compared to ...

Added: June 6, 2016

Identification of Singleton Mentions in Russian

Toldova S., Max Ionov, , in: CLLS 2016. Computational Linguistics and Language Science. Proceedings of the Workshop on Computational Linguistics and Language Science. Moscow, Russia, April 26, 2016Vol. 1886.: Aachen: CEUR Workshop Proceedings, 2017. Ch. 5 P. 33–41.

This paper describes a pilot study of the problem of detecting singleton mentions in Russian texts. A noun phrase is considered a singleton mention if it is the only referent of some entity. We discuss various morphosyntactic and lexical features, some of which were used for analogous tasks for English and propose new features derived ...

Added: November 9, 2017

Coreference resolution for Russian: the impact of semantic features

Toldova S., Maxim Ionov, , in: Computational Linguistics and Intellectual Technologies. International Conference "Dialogue 2017" ProceedingsVol. 1. Issue 16 (23).: M.: -, 2017. P. 339–348.

This paper presents the results of our experiments on building a general coreference resolution system for Russian. The main aim of those experiments was to set a baseline for this task for Russian using the standard set of features developed and tested for coreference resolution systems created for other languages. We propose several baseline systems, ...

Added: July 12, 2017

Интерпретация русских местоимений в контекстах контрфактического тождества: опыт корпусного исследования

Тискин Д. Б., В кн.: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.)Вып. 17(24).: М.: Издательский центр «Российский государственный гуманитарный университет», 2018. С. 735–746.

This paper is a first step towards a corpus-based description of the semantics of Russian pronouns in intensional contexts. Having justified the use of corpus in (formal) semantic research, I delineate a particular issue within the topic: whether a given pronoun is interpreted de se or de re in counteridentity contexts. A counteridentity context is a ...

Added: February 19, 2019

Proceedings of the First Workshop on Computational Approaches to Discourse

Association for Computational Linguistics, 2020.

Added: November 18, 2020

Anaphora Analysis based on ABBYY Compreno Linguistic Technologies

Skorinkin D., Старостин А., Богданов А. et al., , in: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue 2014”Issue 13 (20).: M.: ., 2014. P. 89–102.

This paper presents an anaphora analysis system that was an entry for the Dialog 2014 anaphora analysis competition. The system is based on ABBY Y Compreno linguistic technologies. For some of the tasks of this competition we used basic features of the Compreno technology, while others required building new rules and mechanisms or making adjustments ...

Added: November 28, 2015

Асимметрия употребления местоимений что и кто и морфологическая одушевлённость

Letuchiy A., Труды института русского языка им. В.В. Виноградова 2017 № XIII С. 272–281

The paper focuses on one syntactic restriction on the use of the interrogative pronoun čto ‘what’. Contrary to kto ‘who’, čto disfavours constructions where it is syntactically parallel and co-referent to the anaphoric pronouns on ‘he’, ona ‘she’, and ono ‘it’. For instance, in the construction kogo “ego” (lit. ‘who “he”’), which the Russian speakers use to find out ...

Added: March 15, 2018