• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Working paper

Coreference annotation in the Russian clinical Pear Stories Corpus: annotation features and preliminary results

Svetlana Yu. Toldova, Mira B. Bergelson, Elizaveta I. Ivtushok, Kira M. Shulgina, Mariya V. Khudyakova.
This work is devoted to the distribution of different referential devices in spoken discourse produced by healthy speakers and people with aphasia and its comparison to written discourse. We discuss some special annotation issues for the corpus of Pear film retellings (Russian CliPS) by people with aphasia (PWA), right hemisphere damage (RHD), and healthy speakers (HP for healthy people) of Russian. The study summarizes the comprehensive annotation schema developed for this task and the preliminary research of the referential choice features based on the corpus. Comparing retellings and written texts, we found a significant difference in the use of basic coreferential expressions between the two. Firstly, there is a significant difference in the distribution of basic NP types. Speakers use reduced devices such as zero anaphora or bare nouns in retellings more frequently than in written texts. There are also differences in the distribution of more granulated features such as the word order within an NP, the use of anaphoric and reduced expressions (demonstratives or zero NPs) for the first mention of an entity, and the inclusion of epistemic markers into NPs. We also found that the retellings produced by PWA and HP do not differ much in terms of the distribution of basic NP types. However, a detailed analysis of different NP types and taking into consideration various disfluencies reveals some prominent differences between the two populations. These include a difference in zero subject distribution, the frequency of non-referential NP links, the frequency of co-reference errors. While adapting the initial coreference annotation scheme we concluded that besides referential ambiguity, which is normally taken into account in spoken discourse analysis, and basic taxonomy of the referential devices (full NP vs. anaphoric pronoun vs. anaphoric zero), other features need to be considered