Тематическое моделирование русского рассказа 1900–1930: наиболее частотные темы и их динамика

Т. Ю. Шерстинова; А. Д. Москвина; М. А. Кирина; А. С. Карышева; Е. О. Колпащикова

doi:10.28995/2075-7182-2022-21-512-526

Publications

?

Тематическое моделирование русского рассказа 1900–1930: наиболее частотные темы и их динамика

С. 512–526.

Sherstinova T., Moskvina A., Kirina M., Карышева А. С., Колпащикова Е. О.

The article describes the results of an experiment on topic modeling of Russian short stories for three successive historical periods of the early 20th century: 1) the beginning of the 20th century until 1913, 2) the warrevolutionary period (1914–1922), and 3) the early Soviet period (1923-1930). Using the Latent Dirichlet Allocation (LDA) algorithm, 9 models were built — 3 samples of different sizes (100, 500, and 1000 stories) for each of the periods. It turned out that in every model there are very frequent “themes” (topics) that characterize with a high probability a fairly significant share of texts in each sample. Moreover, one can also observe a meaningful dynamics of these frequent topics over different time periods, which allows us to consider them as thematic and stylistic markers of the analyzed text collections along with the more traditional quantitative measures of text analysis. The variety of frequent topics turned out to be higher in the second and third periods, which can be explained by the greater lexical and stylistic diversity of the prose of the “era of change”

Keywords: компьютерная лингвистика

Publication based on the results of:

Методы искусственного интеллекта для филологических исследований (2021)

In book

Компьютерная лингвистика и интеллектуальные технологии: по материалам международной конференции «Диалог 2022», выпуск 21

Вып. 21. , Изд-во РГГУ, 2022.

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.

M.: Max press, 2026.

The volume includes 64 papers from the international conference on computational linguistics and intelligent technologies 'Dialogue 2026,' representing a broad spectrum of theoretical and applied research in the field of natural language description, language process modeling, and the development of practically applicable computational linguistic technologies. For specialists in theoretical and applied linguistics and intelligent technologies. ...

Added: June 27, 2026

Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста

П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.

В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...

Added: January 30, 2026

Дискурсивные возможности больших языковых моделей при решении задач генерации новых текстов

Mylnikova A., Гасимов А. Р., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 9 С. 33–38

На основе изучения функционирования больших языковых моделей (LLMs) и специфических характеристик машинной обработки дискурса показано применение экспериментального метода компьютерного и лингвистического анализа для статистического исследования и интерпретации лингвистических характеристик текстов. В качестве материалов исследования использован лингвистический корпус текстов Brown, а также корпуса искусственно сгенерированных текстов с применением Claude Sonnet 3.7 и Grok-3. В механизмах обработки ...

Added: November 19, 2025

Employing computational linguistic technologies and oculography to develop diagnostic tool for detecting autoaggressive tendencies in young people: a riveted gaze into “get rid of the shackles of this world”

Khomenko A., Kasimova L., Sychugov E. et al., Psychiatria Danubina 2025 Vol. 37 No. Suppl. 1 P. 213–223

Background: Early recognition of autoaggressive tendencies in young people is essential for diagnostic screening and reducing suicidality risks. This can be achieved through psycholinguistic approaches such as corpus analysis and eye-tracking studies. Corpus research helps to develop generalized speech patterns of those at risk of suicide, while oculographic methods examine perceptual cues linked to suicidal ...

Added: October 19, 2025

Computational linguistics and intellectual technologies. Papers from the Annual International Conference "Dialogue" (2025)

[б.и.], 2025.

This collection includes 39 papers from the Dialogue 2025 International Conference on Computational Linguistics and Intelligent Technologies, representing a wide range of theoretical and applied research in the fields of natural language description, modeling language processes, and the development of practical computational linguistic technologies. This publication is intended for specialists in theoretical and applied linguistics and ...

Added: October 19, 2025

Тематическая разметка антропологического корпуса: методика классификации шахтерских нарративов

Мазитова Л. Л., Panteleeva L., Вестник Самарского университета. История, педагогика, филология 2024 Т. 30 № 4 С. 156–164

The article describes the methodology for creating an anthropological corpus of texts that are united by belonging to the mining profession. The content of the work correlates with three research tasks: development of a thematic classification, introduction of conventions for highlighting narratives in the text, 3) determination of principles for organizing the corpus according to the themes of ...

Added: January 18, 2025

Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)

Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54

The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...

Added: November 8, 2024

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 21, дополнительный том. Москва: РГГУ, 2022. C. 1001–1190.

РГГУ, 2022.

Сборник включает 17 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2022», представляющих широкий спектр теоретических и прикладных исследований в области компьютерного моделирования естественного языка и создания новых лингвистических технологий. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: May 24, 2024

Одиннадцатая Международная конференция по компьютерной обработке тюркских языков «TurkLang 2023»

Каз.: Издательство Академии наук Республики Татарстан, 2023.

В этом году на базе Бухарского государственного университета прошла уже одиннадцатая международная конференция по компьютерной обработке тюркских языков TurkLang-2023. Предыдущие 10 конференций прошли в Астане (2013, 2022), Стамбуле (2014), Казани (2015, 2017), Бишкеке (2016), Ташкенте (2018), Симферополе (2019), Уфе (2020), Кызыле (2021). География проведения, количество представленных трудов и состав участников конференции подтверждают, что в настоящее ...

Added: March 6, 2024

Linguistic mechanisms of colour term evolution: A diachronic investigation of “Russian browns” buryj and koričnevyj

Bochkarev V. V., Shevlyakova A., Solovyev V. et al., Diachronica 2023 Vol. 40 No. 4 P. 492–531

We investigated diachrony of distributional semantics of two competing Russian colour terms (CTs) for ‘brown’, buryj (11th century) and koričnevyj (17th century), using the Russian subcorpus of Google Books Ngram (2020). Time-series analysis (1800–2019) of bigrams gauged each term’s frequencies of occurrence and changes in combinability with nouns for natural objects, artefacts, abstract concepts and figurative expressions. In frequency, koričnevyj overtook buryj in the ...

Added: February 19, 2024

РАЗРАБОТКА СИСТЕМЫ ГЕНЕРАЦИИ ПОВСЕДНЕВНЫХ ДИАЛОГОВ НА РУССКОМ ЯЗЫКЕ: ПИЛОТНОЕ ИССЛЕДОВАНИЕ

Кругликова В. Г., В кн.: Анализ речи: теоретические и прикладные аспекты: сборник научных статей.: [б.и.], 2023.

The article presents a comparative analysis of various language models used to generate texts and evaluates their effectiveness for the task of generating conversational speech. There are such models as GPT-3, BERT, LSTM involved in the comparative analysis. This study is part of a project of developing a system for generating dialogues in Russian. The ...

Added: December 10, 2023

Think about what you’ve learned: анализ тональности для моделирования пользовательского опыта в сфере онлайн-образования

Kirina M., Человек: образ и сущность. Гуманитарные аспекты 2024 № 2(58) С. 176–204

The article focuses on the application of opinion mining techniques to evaluate user experience on the Hyperskill educational platform, using Python, Java, and Kotlin programming projects as the basis of analysis. The study utilizes sentiment analysis and keyword extraction methods to gauge users' attitudes towards the platform, learning process, and topics covered. To achieve this, ...

Added: December 9, 2023

Литература как социальная сеть: семантические издания классических текстов

Kolmogorova A., Виртуальная коммуникация и социальные сети 2023 Т. 2023 № 3(7) С. 124–130

The paper introduces the phenomenon of semantic editions: a new digital representation of texts and personalities of the Great Literature, e. g., The World of Dante, Mapping the Republic of Letters, Chekhov Digital, Tolstoy Digital, and Pushkin Digital. The author analyzed these platforms to reveal the methods and ideology behind this new format. The everyday ...

Added: October 31, 2023

ИНЖЕНЕРНЫЕ ЛИНГВИСТИЧЕСКИЕ ТЕХНОЛОГИИ В ИССЛЕДОВАНИИ ТЕКСТА

Kolmogorova A., Terra Linguistica 2023 Т. 14 № 1 С. 7–10

The publication is devoted to the analysis of the current state of engineering linguistics, its main directions and research challenges. The definition of language technologies and their typology are formulated according to the criterion of the tasks solved with their help. It is noted that the national school of engineering linguistics manages to maintain a ...

Added: October 31, 2023

Компьютерная лингвистика и интеллектуальные технологии. По материалам ежегодной международной конференции «Диалог». Вып. 22. Дополнительный том

[б.и.], 2023.

Сборник включает 17 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2023», представляющих широкий спектр теоретических и прикладных исследований в области описания естественного языка, моделирования языковых процессов, создания практически применимых компьютерных лингвистических технологий. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: September 14, 2023