Russian challenges for quantitative research

Kopotev M.; O. Lyashevskaya; Mustajoki A.

?

Russian challenges for quantitative research

P. 3–29.

Kopotev M., Lyashevskaya O., Mustajoki A.

The Russian language, despite being one of the most studied in the world, until recently has been little explored quantitatively. After a burst of research activity in the years 1960–1980, quantitative studies of Russian vanished. They are now reappearing in an entirely different context. Today, we have large and deeply annotated corpora available for extended quantitative research, such as the Rus- sian National Corpus, ruWac, ruTenTen, to name just a few (websites for these and other resources will be found in a special section in the References). The present volume is intended to fill the lacuna between the available data and the methods that can be applied to studying them.

Our goal is to present current trends in researching Russian quantitative linguis- tics, to evaluate the research methods vis-à-vis Russian data, and to show both the advantages and the disadvantages of the methods. We especially encouraged our authors to focus on evaluating statistical methods and new models of analysis. New findings concern applicability, evaluation, and the challenges that arise from using quantitative approaches to Russian data. The goal of this volume is therefore twofold: a) to address the topic of quantitative analysis of the Russian language, and b) to present an evaluation of methods applied to Russian data.

Language: English

Full text

Text on another site

Keywords: русский язык анализ данных корпусная лингвистика Russian language corpus linguistics русистика экспериментальная лингвистика data analysis Experimental Linguistics квантитативная лингвистика quantitative methods in linguistics Russian linguistics quantitative linguistics квантитативные методы в лингвистике

In book

Quantitative approaches to the Russian language

Abingdon: Routledge, 2018.

Систематизация равноправных произносительных вариантов в современном русском языке (на материале орфоэпических словарей)

Zubov V., Вопросы лексикографии 2026 № 40 С. 64–86

The article addresses the problem of selecting and systematizing data for the study of pronunciation variation in contemporary Russian and proposes a solution in the form of a specialized database of codified equivalent pronunciation variants (e.g., simmétriya / simmetríya “symmetry”). The article presents a methodology for identifying, selecting, and organizing such variants into a database. ...

Added: July 23, 2026

Russian Pronouns with Focus Antecedents: Coreference and Binding in Corpora

Tiskin D., Компьютерная лингвистика и интеллектуальные технологии 2026 No. 24 P. 656–665

Despite a lot of interest for the factors influencing the choice of pronoun (reflexive or personal) with an antecedent in Russian, the role of the anaphotic relation—coreference or semantic binding—has been understudied, including disagreements as to the acceptability of particular data points. To clarify things, I employ large corpora (Araneum and GICR) to study the ...

Added: July 19, 2026

Тезисы докладов Пятнадцатых Шмелёвских чтений: (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева):Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности

М.: Институт русского языка им. В.В. Виноградова РАН, 2026.

Сборник тезисов Пятнадцатых Шмелёвских чтений (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева) Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности. Охватывает разные аспекты современной русистики: от исторической лексикологии до современных трансформаций прагматики и семантики слов. ...

Added: June 23, 2026

Зачем нужен поэтический корпус и как его использовать

Korchagin K., Русская речь 2019 Т. 6 С. 113–127

Поэтический корпус в составе Национального корпуса русского языка — инструмент для исследователей русской поэзии и поэтическо го языка. Корпус содержит обширную коллекцию русской поэзии XVIII ХХ веков, отражает все заметные поэтические направления и продол жает пополняться. В нем присутствуют два типа разметки — граммати ческая и стиховедческая. Если первая совпадает с разметкой в основ ном ...

Added: June 19, 2026

Syntactic functions of non-manuals in Russian Sign Language

Burkova S., Khristoforova E., Kimmelman V., , in: Advances in Sign Language Corpus Linguistics.: John Benjamins Publishing Company, 2023. P. 90–129.

This chapter presents the Russian Sign Language (RSL) Corpus and demonstrates its capabilities as a research tool by summarizing three corpus-based studies primarily focused on syntactic functions of nonmanual markers. The first study considers question marking in regular wh-questions and in question-answer pairs. It shows that the two constructions have very different nonmanual markers. The second study analyzes marking of ...

Added: June 3, 2026

Juxtapositional vs. possessive-like encoding in Russian specificational constructions

Logvinova N., Russian linguistics 2026 Vol. 50 Article 11

This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...

Added: May 18, 2026

Focus on vocabulary. Экономика материальных и нематериальных активов: корпусный словарь и ИИ-упражнения по английскому языку

Gorina O. G., Kucherenko S., Larisa K. et al., СПб.: Астерион, 2026.

This textbook is an integrated teaching and learning resource for English for Specific Purposes (ESP) in the field of economics of tangible and intangible assets. Its design employs (i) modern corpus linguistics methods, including frequency analysis and keyword extraction based on authentic texts reflecting current trends in professional discourse, and (ii) artificial intelligence technologies for ...

Added: May 16, 2026

Современные методы анализа временных рядов в мониторинге и прогнозировании состояния оборудования для механизированной добычи

Neznanov A., Glushko A., Овчинников С. et al., В кн.: Интеллектуальный анализ данных в нефтегазовой отрасли.: М.: ООО «Геомодель Развитие», 2024. С. 140–143.

With the development of monitoring systems, now we have the opportunity to collect key performance indicators of devices in the process of artificial lift. Every day a huge amount of telemetry is generated by our devices, which can be used to forecast the working mode and health state of the equipment after the process of ...

Added: April 29, 2026

Интеллектуальная гармонизация и трансформация данных механизированной добычи на основе открытых решений

Neznanov A., Емельянов В., Glushko A. et al., В кн.: Интеллектуальный анализ данных в нефтегазовой отрасли.: М.: ООО «Геомодель Развитие», 2024. С. 42–45.

The complexity of data processing in corporate information systems at the oil and gas industry is constantly growing due to increasing data quantity and heterogeneity. It requires the adoption of modern methodologies and tools for data and knowledge management. Many architectural solutions have already been tested by the OSDU consortium, but in the current conditions ...

Added: April 29, 2026

Российская социология в условиях цифровизации общества: результаты анализа корпуса научных текстов

Smirnov A., Социологические исследования 2023 № 4 С. 39–50

Using the analysis of a corpus of texts from eight leading Russian sociological journals, the article examines the impact of the digitalization of society on sociology in 2000–2021. Frequency analysis of 13.8 thousand scientific texts tracked the introduction of concepts related to digitalization into academic circulation. The article reveals the differences between the journals, due ...

Added: March 18, 2026

Дискриминативная лемматизация сокращений в эпоху LLM

Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155

This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...

Added: March 10, 2026

Rubic2: Ensemble Model for Russian Lemmatization

Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.

Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...

Added: March 10, 2026