Referential Choice: Factors and Modeling

M. Khudyakova; Kibrik A.; Dobrov G.; Linnik A.

?

Referential Choice: Factors and Modeling

P. 16–20.

Khudyakova M., Kibrik A., Dobrov G., Linnik A.

Referential choice is the process of selecting an appropriate referential expression for a referent that the speaker/writer intends to mention at some point in discourse. Referential choice is governed by the referent's current status in the speaker's/writer's working memory. This status, in turn, is determined by a number of factors, rooted in discourse context and referent's properties. Activation in working memory is immediately responsible for the coarse choice between full and reduced referential devices, which is the high level distinction in the hierarchical organization of referential choice. Lower levels of granularity correspond to the choice between proper names and description, and still more refined options. Referential choice is a multi-factorial process. We have created a corpus of written texts in which many potentially relevant factors of referential choice are annotated. We also use another corpus in which the same texts are annotated for discourse structure, as it is known that rhetorical distance, measured on the basis of hierarchical discourse structure, is a powerful factor of referential choice. We have modeled referential choice in the corpus with the help of a variety of machine learning algorithms. The accuracy of prediction for the choice between full and reduced referential devices is close to 90%, and for the three-way choice between pronouns, descriptions, and proper names it is close to 80%. We experimented with the reduction of the set of factors and explored the phenomenon of non-categorical that is probabilistic, referential choice.

Language: English

Keywords: corpus linguistics referential choice Cognitive Linguistics

In book

LATEUM 2013. Conference Proceedings. ELT and Linguistics 2013: New Strategies for Better Solutions

M.: Max press, 2013.

Российская социология в условиях цифровизации общества: результаты анализа корпуса научных текстов

Смирнов А. В., Социологические исследования 2023 № 4 С. 39–50

Using the analysis of a corpus of texts from eight leading Russian sociological journals, the article examines the impact of the digitalization of society on sociology in 2000–2021. Frequency analysis of 13.8 thousand scientific texts tracked the introduction of concepts related to digitalization into academic circulation. The article reveals the differences between the journals, due ...

Added: March 18, 2026

Promotional adjectives in grant proposal abstracts: a corpus study

Tulyakov D., Permyakova T. M., Balezina E., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Vol. 24 No. 6 P. 58–67

By effectively integrating promotional discourse into grant proposal abstracts, researchers can more compellingly present their ideas and increase their chances of securing funding. Implications of promotional adjectives in grant writing might differ across various research fields. This study aims to explore the use of promotional adjectives in abstracts of research grant proposals in six research ...

Added: March 2, 2026

Динамика восприятия площадей в пространстве города носителями русского языка (сравнительный анализ по данным НКРЯ)

Belova P., В кн.: Актуальные вопросы лингвистики и литературоведения: сборник научных статей по материалам международной научной конференции памяти доктора филологических наук, профессора Л.А. Араевой (6–8 февраля 2025).: Кемеровский государственный университет, 2025. С. 155–160.

This article contains research results on the dynamics of squares’ perception in the city space in the Russian language picture of the world over time, starting from the second half of the XXth century to the present. Turning to the subcorpus of literary texts of the second half of the XXth century and the XXIst ...

Added: February 4, 2026

Preposition drop in Russian spoken by Mari and Beserman bilinguals

Yakovleva A., Kosheliuk N., Moroz G., International Journal of Bilingualism 2025 P. 1–19

Aims and Research Questions: In this paper, we present a corpus-based study of preposition drop (p-drop) in the speech of Mari-Russian and Beserman-Russian bilinguals compared to the speech of Russian monolinguals. Based on data from spoken corpora, we demonstrate that the prepositions v ‘in’, k ‘to’, s ‘with’ are omitted in the speech of bilinguals ...

Added: November 26, 2025

Вариативность годов vs. лет в русских говорах: корпусное исследование

Zemicheva S., Moroz G., Naccarato C., Вопросы языкознания 2025 № 6 С. 7–34

Наличие супплетивной формы лет в парадигме существительного год отличает русский язык от других восточнославянских. При этом в русских говорах вместо лет может использоваться вариант годов. Данные панхронического подкорпуса НКРЯ показывают, что форма годов, зафиксированная впервые в XV в., на всем протяжении истории русского языка была периферийной, в XVII–XVIII вв. использовалась преимущественно в нехудожественных текстах, а в ...

Added: November 12, 2025

Automatic Annotation of Discourse and Speech Formulas in Internet Communication: A Telegram Comment Corpus

Maslenikova A., Tatiana I. Popova, , in: 27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part I. Speech and Computer. Lecture Notes in Artificial Intelligence 16187Vol. 16187: Lecture Notes in Artificial Intelligence.: Springer, 2025. P. 278–292.

This article presents a system for the automatic processing of user comments aimed at annotating speech and discourse formulas that actively function in everyday interaction, including digital communication. A Python-based program using the Telegram API was developed to automate the collection, filtering, and annotation of empirical data. In addition to building a user corpus, the ...

Added: October 19, 2025

27th International Conference, SPECOM 2025, Szeged, Hungary, October 13–15, 2025, Proceedings, Part II. Speech and Computer. Lecture Notes in Artificial Intelligence 16188

Springer, 2025.

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or ...

Added: October 19, 2025

Psychological Applications Conference and Trends (InPACT 2022).

inScience Press, 2022.

Added: September 23, 2025

Variation in a Narrative Corpus of Mano and Kpelle: Contact-Induced or Not?.

Khachaturyan M., Konoshenko M., Moroz G. et al., , in: N’yng-dyuumgu, n’yng-ngafq: Festschrift for Ekaterina GruzdevaVol. 126.: Helsinki: Studia Orientalia, 2025. P. 35–59.

This paper explores a corpus of spontaneous narratives and narrative retellings told by children and adults in Mano and Kpelle, two contacting Mande languages. It focuses on quotative constructions as a key point of grammatical dissimilarity between Mano and Kpelle. In the Mano speech of some bilingual children, however, these constructions are found to manifest ...

Added: September 5, 2025

Переписка Н. С. Хрущева и Ф. Кастро периода Карибского кризиса: опыт компьютеризованного анализа

Герцен А. С., В кн.: Четвёртая зимняя школа по гуманитарной информатике.: Балтийский федеральный университет им. Иммануила Канта, 2020. С. 92–97.

The article analyzes the 1st Secretary of the Central Committee of the CPSU and Chairman of the Council of Ministers of the USSR N. S. Khrushchev and the leader of the Cuban revolution F. Castro Ruz’s letters written in the period from October 26 to 31, 1962 on the topic of the Caribbean crisis and ...

Added: July 15, 2025

An overview of morphosyntactic variation in the speech of Russian-Chuvash bilinguals: number, gender, case assignment and preposition drop

Grishanova A., Russian linguistics 2025 Vol. 49 Article 10

The purpose of this study is to present a summary of morphosyntactic variation and a detailed analysis of the phenomenon of preposition drop in the Russian speech of Chuvash bilinguals. Specifically, I investigate what underlying factors might condition the variation. I conduct a qualitative analysis of the data extracted from the corpus of Russian spoken ...

Added: July 10, 2025

Do Formal Stance Strategies Reveal Disciplinary Variation in Professional Scientific Writing?

Smirnova E. A., Pérez-Guerra J., International Journal of Applied Linguistics 2025 Vol. 35 No. 3 P. 1242–1261

Stance in academic discourse has been extensively studied, with numerous investigations indicating that its expression varies across disciplines, depending on the authors’ intention to either enhance or diminish their voice or presence (e.g. It seems fairly certain versus This is based on the belief that...). This paper hypothesises that stance can be viewed as a ...

Added: April 10, 2025

Русский язык в условиях контактирования: тюркско-русское языковое взаимодействие. Часть 1. Социолингвистическое и корпусное исследование

Резанова З. И., Artemenko E., Диброва В. С. et al., Томск: Издательство Томского государственного университета, 2024.

В монографии представлены собственно лингвистические, социолингвистические и психолингвистические аспекты взаимодействия русского и трех тюркских языков – шорского, хакасского, татарского (сибирского варианта). Охарактеризованы варианты влияния тюркских языков на речевую практику и когнитивные процессы порождения и восприятия речи русскоязычными билингвами. Представлены методики сбора данных, их обработки при формировании социолингвистической базы данных и морфологически размеченного бимодального корпуса русской устной речи билингвов, ...

Added: April 7, 2025

The ‘adverb-ly adjective’ construction in English: meanings, distribution and discourse functions

Taboada M., Goddard C., Trnavac R., English Language and Linguistics 2025 Vol. 29 No. 1 P. 102–131

We investigate a class of adjective phrases composed of a deadjectival adverb ending in -ly and an adjective head (e.g. staggeringly incompetent, absolutely terrific, fiscally responsible), a compact construction whereby two adjectives may jointly contribute to evaluative meaning. Using corpus methodologies on more than 1 million examples and relying on semantic analyses of about 1,000 instances, we propose that the ...

Added: April 4, 2025

Creation and Analysis of the Multimedia Russian Corpus for Gesture Research

Rakhilina E. V., Cienki A., , in: The Cambridge Handbook of Gesture Studies.: Cambridge University Press, 2024. P. 249–272.

The chapter considers gesture studies in relation to corpus linguistic work. The focus is on the Multimedia Russian Corpus (MURCO), part of the Russian National Corpus. The chapter includes a brief biography of the creator of this corpus, Elena Grishina. The compilation of the corpus out of a set of Russian classic feature films and ...

Added: February 13, 2025

ИСПОЛЬЗОВАНИЕ МЕТОДОВ КОМПЬЮТЕРНОЙ ЛИНГВИСТИКИ ДЛЯ АНАЛИЗА ЛИТЕРАТУРЫХ ТЕКСТОВ

Аванесян Н. Л., Fokina A., Chepovskiy A., В кн.: Инжиниринг предприятий и управление знаниями (ИП&УЗ-2024) : сборник научных трудов XXVII Российской научной конференции. 28–29 ноября 2024 г. / под науч. ред. Ю. Ф. Тельнова. – Москва : ФГБОУ ВО «РЭУ им. Г. В. Плеханова», 2024.: М.: ФГБОУ ВО "РЭУ им. Г.В. Плеханова", 2024. С. 15–18.

Статья посвящена применению математических методов корпусного анализа для исследований литературных текстов. На примере созданных корпусов продемонстрированы возможности применения метода анализа соответствий и анализ коэффициентов попарной ранговой корреляции для сравнения частотных характеристик текстов различных подкорпусов. Описанные методики дают коррелированные результаты. Они могут использоваться как для лингвистических исследований, так и создания корректных обучающих текстовых наборов для задач искусственного интеллекта. ...

Added: December 19, 2024

Корпусная лингвистика на современном этапе

Plungian V., Вестник Российской академии наук 2024 Т. 94 № 9 С. 787–794

Даётся общее представление о корпусной лингвистике, её истории, методах и влиянии на современные представления об изучении языка, которое обычно обозначается как “корпусная революция”. ...

Added: December 16, 2024

Популистский текст как объект корпусного исследования

Галочкин А. Е., В кн.: ЧЕЛОВЕК В СИСТЕМЕ КОММУНИКАЦИЙ: ПРОФЕССИОНАЛЬНЫЕ КОММУНИКАЦИИ В ЦИФРОВУЮ ЭПОХУ.: Нижегородский государственный лингвистический университет им. Н.А. Добролюбова, 2023. С. 87–90.

This article discusses the phenomenon of populism in the context of corpus linguistics methods, which is of particular importance in the modern world. The relevance of this study is related to the growth of right-wing populism in European countries and the importance of understanding the mechanisms of populist discourse. The article analyzes studies aimed at ...

Added: November 16, 2024

Коньячку бы, да до дому: хронология развития некоторых форм второго родительного падежа

Budennaya E., Труды института русского языка им. В.В. Виноградова 2024 № 2(40) С. 261–282

The article based on the material form Russian National Corpus discusses the diachronic development of structures with Russian second genitive case in three types of contexts: 1) with nominal quantifiers; 2) with the preposition bez ‘without’; 3) with the preposition do ‘towards’. The data obtained from Russian language are compared with the data from other languages (Finnic and several Turkic), in which there is a tendency to use the partitive ...

Added: October 4, 2024