Discovering dialectal differences based on oral corpora

V. Andriyanets; M. Daniel; Pakendorf B.

?

Discovering dialectal differences based on oral corpora

P. 28–38.

Andriyanets V., Daniel M., Pakendorf B.

This paper discusses a method to detect statistically significant linguistic differences between corpora while factoring in possible variability within the very corpora to be compared. Specifically, we compare two small corpora of dialects of Even, Bystraja and Lamunkhin Even, in an attempt to identify morphemes that are more frequent in either of the corpora. To investigate whether this difference might be due to an over-representation of a speaker who happens to be an outlier in terms of using a particular morpheme, we use DP, a measurement of evenness of the distribution of a specific linguistic feature across subcorpora of the same corpus.

Language: English

Text on another site

Keywords: корпусная лингвистика corpus linguistics dialect диалект Even эвенский язык лингвистическая вариативность

Publication based on the results of:

Convergence processes in the history of language: phonetics, grammar, lexicon (2018)

In book

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.)

Вып. 17(24). , М.: Издательский центр «Российский государственный гуманитарный университет», 2018.

Популистский текст как объект корпусного исследования

Галочкин А. Е., В кн.: ЧЕЛОВЕК В СИСТЕМЕ КОММУНИКАЦИЙ: ПРОФЕССИОНАЛЬНЫЕ КОММУНИКАЦИИ В ЦИФРОВУЮ ЭПОХУ.: Нижегородский государственный лингвистический университет им. Н.А. Добролюбова, 2023. С. 87–90.

This article discusses the phenomenon of populism in the context of corpus linguistics methods, which is of particular importance in the modern world. The relevance of this study is related to the growth of right-wing populism in European countries and the importance of understanding the mechanisms of populist discourse. The article analyzes studies aimed at ...

Added: November 16, 2024

Язык немецкого национального меньшинства Вятского региона: статус и перспективы развития

Bukharov V., Байкова О. В., Вопросы языкознания 2016 № 2 С. 75–89

The status and the vector of development of German dialects in Russia are an issue of fundamental im- portance not only for Russian dialectology but also for linguistics of island dialects in general. The article examines an under-investigated group of the Vyatka region German dialects, which occupy a special place in dialectology, for they experienced ...

Added: October 4, 2018

Прогностическая валидность глагольных форм длительного аспекта в корпусной лингвистике английского языка

Popkova E., Социосфера 2010 № 4 С. 74–81

The article discusses the most recent trends in the development of the English progressive. A corpus-based approach to linguistic research is seen as an effective means of determining reliability of the data retrieved and helps track the major diachronic dynamic in the increasing frequency of the progressive aspect that has taken place since the beginning ...

Added: November 6, 2012

Конструирование образа Перми в комментариях социальных медиа

Matkin N., Культура и технологии 2021 Т. 6 № 1 С. 26–32

There were a lot of changes during 2019 and 2020 in Perm such as transport reform, zoo construction, change of governor and mayor. All changes reflect on the image of the city, which is constructing in the residents’ mind. From one hand the image of the city is formed by media, on the other hand ...

Added: October 23, 2021

Автоматическое определение частей речи для русского языка с помощью обучения трансформаций.

Kitov V. V., Научные труды Вольного экономического общества России 2014 Т. 186 С. 228–235

This paper describes the application of well-known «transformation-based learning» algorithm of automatic rule generation for the task of part-of-speech tagging. Algorithm is applied to corpora of annotated Russian texts and accuracy as well as most significant rules are shown. ...

Added: March 16, 2016

Переписка Н. С. Хрущева и Ф. Кастро периода Карибского кризиса: опыт компьютеризованного анализа

Герцен А. С., В кн.: Четвёртая зимняя школа по гуманитарной информатике.: Балтийский федеральный университет им. Иммануила Канта, 2020. С. 92–97.

The article analyzes the 1st Secretary of the Central Committee of the CPSU and Chairman of the Council of Ministers of the USSR N. S. Khrushchev and the leader of the Cuban revolution F. Castro Ruz’s letters written in the period from October 26 to 31, 1962 on the topic of the Caribbean crisis and ...

Added: July 15, 2025

Corpora as indicators of (non-)existence

Piperski A., , in: Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной Международной конференции "Диалог" (2015).: М.: Изд-во РГГУ, 2015. P. 494–500.

This paper discusses the notions of acceptability, occurrence, grammaticality and existence, and focuses on the relationship between corpus linguistics and the question of the existence of lexical items. Since corpora are almost exclusively samples from larger populations, it is claimed that they cannot provide evidence for non-existence of words, collocations or constructions. This is because ...

Added: March 13, 2016

Корпус как инструмент и как идеология: о некоторых уроках современной корпусной лингвистики

Plungian V., Русский язык в научном освещении 2008 № 16 (2) С. 7–20

Added: November 12, 2023

Russian predicates selecting remarkable clauses: corpus-based approach and Gricean perspective

Zevakhina N., Dainiak A., , in: Bridging Formal and Conceptual Semantics: Selected Papers of the BRIDGE Workshop 14, Studies in Language and Cognition 4.: Dusseldorf University Press, 2017. P. 187–208.

This paper reports upon the study of the lexico-grammatical distribution of Russian matrix predicates selecting kakoj remarkable clauses (or so-called ‘embedded’ exclamatives) in the Russian National Corpus, with some cross-linguistic parallels. It reveals that Russian matrix predicates belong to four conceptual classes: perceptual, mental, emotive, and speech. It shows that the phenomenon of ‘embedded’ exclamatives ...

Added: March 8, 2016

An overview of morphosyntactic variation in the speech of Russian-Chuvash bilinguals: number, gender, case assignment and preposition drop

Grishanova A., Russian linguistics 2025 Vol. 49 Article 10

The purpose of this study is to present a summary of morphosyntactic variation and a detailed analysis of the phenomenon of preposition drop in the Russian speech of Chuvash bilinguals. Specifically, I investigate what underlying factors might condition the variation. I conduct a qualitative analysis of the data extracted from the corpus of Russian spoken ...

Added: July 10, 2025

Прагматические маркеры предикативного типа в устной спонтанной речи представителей разных социальных групп

Zaides K., Социо- и психолингвистические исследования 2020 № 8 С. 40–47

В статье рассматриваются особенности употребления прагматических маркеров предикативного типа (знаешь/те, (я) не знаю, (я) (не) думаю (что), представь/те и т. п.) в устной спонтанной речи представителей разных социальных групп. Материалом для исследования послужил рабочий подкорпус, сформированный из 150 000 токенов корпуса повседневной русской речи (фактически – диалогов) «Один речевой день» и 150 000 токенов корпуса ...

Added: February 3, 2022

ИСПОЛЬЗОВАНИЕ МЕТОДОВ КОМПЬЮТЕРНОЙ ЛИНГВИСТИКИ ДЛЯ АНАЛИЗА ЛИТЕРАТУРЫХ ТЕКСТОВ

Аванесян Н. Л., Fokina A., Chepovskiy A., В кн.: Инжиниринг предприятий и управление знаниями (ИП&УЗ-2024) : сборник научных трудов XXVII Российской научной конференции. 28–29 ноября 2024 г. / под науч. ред. Ю. Ф. Тельнова. – Москва : ФГБОУ ВО «РЭУ им. Г. В. Плеханова», 2024.: М.: ФГБОУ ВО "РЭУ им. Г.В. Плеханова", 2024. С. 15–18.

Статья посвящена применению математических методов корпусного анализа для исследований литературных текстов. На примере созданных корпусов продемонстрированы возможности применения метода анализа соответствий и анализ коэффициентов попарной ранговой корреляции для сравнения частотных характеристик текстов различных подкорпусов. Описанные методики дают коррелированные результаты. Они могут использоваться как для лингвистических исследований, так и создания корректных обучающих текстовых наборов для задач искусственного интеллекта. ...

Added: December 19, 2024

Looking for contextual cues to differentiating modal meanings: A corpus-based study

Lyashevskaya O., Ovsjannikova M., Szymor N. et al., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 51–78.

The domain of modality is structurally diverse and may be described in multiple ways (for example, see Perkins, 1983; Wierzbicka, 1987; Hengeveld, 1988/2004; Sweetser, 1990; Bondarko, 1990; Bybee et al., 1994; van der Auwera and Plungian, 1998; Palmer, 2001; Hansen, 2004; Nuyts, 2006; Khrakovsky, 2007). The article reports on the Russian part of a larger survey ...

Added: October 24, 2017

Using TXM Platform for Research on Language Changes over Time: The Dynamics of Vocabulary and Punctuation in Russian Literary Texts

Lavrentiev A. M., Sherstinova T., Chepovskiy A. et al., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2021 Vol. 70 P. 69–89

The purpose of this paper is to test the methodological tools provided by TXM platform for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM is a powerful text analysis software which provides both quantitative and qualitative features in a transparent open-source implementation. In this paper, we demonstrate how it can be ...

Added: June 24, 2021

Инновации в преподавании иностранных языков студентам-юристам: материалы межвузовской научно-практической конференции (15 марта 2012 года)

Российская правовая академия Минюста РФ, 2012.

Статья прослеживает многовековой путь проникновения африканизмов в современный американский вариант английского языка. Для более глубокого понимания лексических особенностей этой разновидности английского языка, автор обращается к истории США, доказывая ее непосредственное влияние на развитие различных пластов американской лексики. ...

Added: October 11, 2015

Russian Minority Languages on the Web: Descriptive Statistics

Orekhov B., Krylova I., Popov I. et al., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 (22) P. 452–461

Статья о малых языках России в Интернете ...

Added: November 7, 2017

Corpus of Russian student texts: design and prospects

Zevakhina N., Dzhakupova S., , in: Материалы 21-й Международной конференции по компьютерной лингвистике "Диалог".: М.: Изд-во РГГУ, 2015.

The Corpus of Russian Student Texts (CoRST) is a computational and research project started in 2013 at the Linguistic Laboratory for Corpora Research Technologies at HSE. It comprises a collection of Russian texts written by students from various Russian universities. Its main research goal is to examine language deviations viewed as markers of language change. ...

Added: May 20, 2015

Компьютерные методы анализа для определения гендерной принадлежности текста. Опыт практического исследования

Khomenko A., В кн.: Когнитивно-дискурсивная парадигма в лингвистике и смежных науках: современные проблемы и методология исследования: материалы Х Международного конгресса по когнитивной лингвистике. 17–20 сентября 2020 г.Т. 2(41).: Уральский государственный педагогический университет, 2020. С. 893–897.

В настоящей статье речь пойдет о применении интегративного подхода к определению гендера в рамках решения задач судебной лингвистики. Автор интегрирует методы когнитивной науки, корпусной и, шире, компьютерной лингвистики, а также классический структурный анализ текста для идентификации характеристик мужской и женской речи. ...

Added: August 11, 2021

Referential Choice: Predictability and Its Limits

Kibrik A. A., Khudyakova M., Dobrov G. B. et al., Frontiers in Psychology 2016 Vol. 7 No. 1429 P. 1–21

We report a study of referential choice in discourse production, understood as the choice between various types of referential devices, such as pronouns and full noun phrases. Our goal is to predict referential choice, and to explore to what extent such prediction is possible. Our approach to referential choice includes a cognitively informed theoretical component, ...

Added: September 28, 2016

The Second Genitive in Russian

Daniel M., , in: Partitive cases and related categories.: Berlin, NY: De Gruyter Mouton, 2014. Ch. 9 P. 347–377.

This paper is an overview of the so-called second genitive in Russian, a nominal form available for a minority of Russian nouns but widely used with these nouns in certain contexts. In many ways, the second genitive is a secondary case. Thus, it may always be substituted with a regular genitive form, while the opposite ...

Added: October 17, 2013

Когнитивный термин «фрейм»: создание словарной статьи на базе специализированного текстового корпуса

Khomenko A., Куликова В. А., Babiy A. et al., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2022 Т. 20 № 4 С. 17–34

The study is devoted to the testing of a specialized texts corpus on the example of a group of cognitive linguistics terms with the hypernym frame. The corpus includes a subcorpus of scientific texts and a subcorpus of journalistic texts. The first one is represented by 15 journals indexed in the RSCI; the second one ...

Added: November 17, 2022

Корпус в обучении иностранному языку (на материале английского языка)

Gorina O. G., СПб.: Свое Издательство, 2014.

В настоящем издании наглядно иллюстрируются широкие лингводидактические возможности корпусной лингвистики при обучении профессионально-ориентированному общению на английском языке. Обширный языковой материал специально разработанного корпуса профессионального дискурса и других корпусных ресурсов лег в основу вариативных упражнений, заданий, исследований, которые использовались для развития лексических навыков в устной и письменной речи студентов специальности «Регионоведение». Рекомендуется специалистам – филологам, лингводидактам, ...

Added: February 20, 2017

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М.: Языки славянской культуры, 2016.

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015