Тематическое моделирование для коротких текстов: сравнительный анализ

?

Тематическое моделирование для коротких текстов: сравнительный анализ

Социология: методология, методы, математическое моделирование. 2023. № 56. С. 69–112.

Vashchenko V.

The steady increase in the popularity of social media as a means of communication actualizes methodological issues related to processing of short texts with less semantic context than large corpora, which are widely used for training and testing machine learning models for textual data. Topic modeling, an unsupervised machine learning technique aimed at aggregating texts into topic clusters, has many academic and practical applications where information on true groupings of texts is not available. However, the performance of topic modeling algorithms may be limited by requirement of a sufficient semantic context for a high-quality numerical representation of a unit of text, which may not be derived effectively from a short document. This paper discusses 3 different approaches to topic modeling: classical LDA enriched with pre-trained word embeddings, topic modeling based on the BERT transformer model, and a network-based approach to topic modeling using stochastic blockmodels. We compare the performance of the above algorithms on a set of Russian-language comments on TikTok and formally evaluate their performance based on speed and coherence of the resulting topics.

Research target: Sociology (including Demography and Anthropology Media and Communications Computer Science

Language: Russian

DOI

Publication based on the results of:

Development of network analysis in Russia: adaptation of theoretical and methodological approaches and practical application (2024)

Stable On-the-Fly Learning for Dynamic Neural Networks With Delayed Inputs

Kibkalo Vladislav, Chertopolokhov V., Mukhamedov A. et al., IEEE Access 2026 Vol. 14 P. 14369–14392

This study presents on-the-fly identification and multi-step prediction of nonlinear systems with delayed inputs using a dynamic neural network combined with a smooth projection onto ellipsoids. The projection enforces parameter constraints that guarantee stability, while a Lyapunov–Krasovskii analysis yields computable ultimate error bounds. Riccati-type matrix inequalities are derived, providing an efficient vectorization–projection–devectorization implementation suitable for ...

Added: May 22, 2026

Опыт применения сетевого анализа (SNA) в историческом нарративе полисубъектного региона (на примере валлийской хроники Brut y Tywysogyon)

Loshkareva M. E., Matveeva N., Вестник Томского государственного университета. История 2026 № 100 С. 112–118

This research is an endeavor to apply social network analysis (SNA) to the study of a medieval narrative source. The authors suppose that the use of network analysis may offer new possibilities in the study of the history of regions characterized by some political fragmentation. Authors tried to construct networks of historical interactions from 1193 ...

Added: May 22, 2026

Эстетика аудиовизуальной журналистики. Учебное пособие. 2-е издание

Novikova A., Бережная М. А., Кирия И. В., КноРус, 2026.

The aesthetics of journalism is substantiated as a necessary component in the professional training of specialists in audiovisual media. The factors and trends of historical and current changes in the aesthetics of journalism are presented, and the aesthetic practices of audiovisual journalism are characterized in terms of their social functioning. Criteria for aesthetic evaluation are ...

Added: May 22, 2026

Проблемы интеграции культурного наследия в креативные индустрии Республики Тыва

Монгуш В. Р., Novikova A., Креативные индустрии 2026 Т. 2 № 1 С. 23–41

This article analyzes the historical and cultural background, as well as the current situation and development prospects of the creative industries ecosystem in the Republic of Tuva. A comparative analysis of this remote, subsidized region and its neighbors, the Sakha Republic (Yakutia) and Krasnoyarsk Krai, revealed its strengths, vulnerabilities, and strategies of young creative professionals ...

Added: May 21, 2026

Стили жизни российской молодежи в отношении здоровья: гендерные различия

Orekhov A., Zakharov A., Мониторинг общественного мнения: Экономические и социальные перемены 2026 № 2 С. 3–23

This article investigates the health-related lifestyles of Russian youth. Utilizing longitudinal data from the «Trajectories in Education and Profession» study (N = 3398, 2022), a latent class analysis was conducted, identifying three distinct classes of young people (mean age: 26): those adhering to a healthy lifestyle, those prone to unhealthy habits, and those passive about ...

Added: May 21, 2026

Теория партизана. Промежуточное замечание к понятию политического. Изд. 2-е, исправ. и доп.

Шмитт К., М.: Праксис, 2026.

Классическая работа известного немецкого правоведа и политического теоретика Карла Шмитта, посвященная рассмотрению партизана как «фигуры мирового духа», начиная с его зарождения в ходе борьбы испанского народа против наполеоновских войск в 1808—1813 годах и вплоть до судьбы партизана в ходе «всемирной гражданской войны» ХХ века. Перевод с немецкого Ю. Ю. Коринца. Новая редакция перевода Т. А. Дмитриева ...

Added: May 20, 2026

Три России Макса Вебера: к веберовской социологии русского модерна

Kildyushov O., Мир России: Социология, этнология 2026 Т. 35 № 2 С. 6–21

This article examines a heuristic framework for analyzing the significance of Russian themes in Max Weber’s corpus, in connection with the completion of the complete edition of his works as a comprehensive source base. It highlights the ambivalent position of Russian themes in the his legacy: while Russia was never central to his scholarship, the issue ...

Added: May 20, 2026

ML-based Fast Simulation of FARICH Responses

Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...

Added: May 19, 2026

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Rabat: Association for Computational Linguistics, 2026.

Added: May 19, 2026

Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Kondratev S., Yulia Dyrchenkova, Georgiy Nikitin et al., Technologies 2026 Vol. 14 No. 1 Article 69

Added: May 19, 2026

Стили жизни российской молодежи в отношении здоровья: гендерные различия

Zakharov A., Мониторинг общественного мнения: Экономические и социальные перемены 2026 № 2 С. 3–23

Added: May 19, 2026

Parallel Computational Technologies. PCT 2025

Springer, 2025.

This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...

Added: May 18, 2026

KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security

Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15

To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...

Added: May 16, 2026

DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data

Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66

Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...

Added: May 16, 2026

Подход к автоматическому распознаванию эмоций в транскрипциях речи

Dvoynikova A., Кондратенко К. О., Известия высших учебных заведений. Приборостроение 2023 Т. 66 № 10 С. 818–827

Аннотация. Исследован актуальный в различных областях вопрос распознавания эмоций в транскрипциях речи. Проанализировано влияние методов предобработки (удаление стоп-слов, лемматизация, стемминг) на точность распознавания эмоций в текстовых данных на русском и английском языках. Для проведения экспериментальных исследований использовались орфографические транскрипции диалогов из многомодальных корпусов RAMAS и CMU-MOSEI на русском и английском языке соответственно. Аннотирование этих корпусов ...

Added: April 25, 2026

Эко-реальность и эко-образ российских регионов в пабликах социальной сети «В Контакте»

Nemirovskaya A., Муничкина О. П., Вестник Института социологии 2026 Т. 17 № 1 С. 183–208

This paper examines media representation of environmental problems in six Russian regions through the lens of regional public pages (with official and unofficial status) on the VKontakte social network, which function as online media. Based on a content analysis of news public pages on VKontakte from six Russian regions, including both environmentally favorable and unfavorable ...

Added: April 1, 2026

Эмодукты счастья: коммодификация и маркетинговые стратегии в популярной психологии

Matkin N., Novikova A., Экономическая социология 2026 Т. 27 № 1 С. 92–124

In the context of the growing demand for psychological services in Russia and the spread of therapeutic culture, digital platforms like YouTube are becoming a key locus for the commercialization of emotions. However, the mechanisms of commodification, particularly concerning happiness, remain underexplored in this digital environment. This article examines how popular Russian psychological bloggers on ...

Added: February 2, 2026

Optimizing Modality Weights in Topic Models of Transactional Data

Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922

Added: November 19, 2025

Interaction of Functional Brain Networks Is Associated With k-Clique Percolation in the Human Structural Connectome

Dogonasheva O., Zakharov D., Tiselko V. et al., Human Brain Mapping 2025 Vol. 46 No. 15 Article e70343

The human structural connectome has a complex internal community organization, characterized by a high degree of overlap and related to functional and cognitive phenomena. We explored connectivity properties in connectome networks and showed that 𝑘‐clique percolation of an anomalously high order is characteristic of the human structural connectome. The resulting structural organization maintains a high local ...

Added: November 11, 2025

Анализ тематики повседневных разговоров: экспертный подход и автоматические методы

Sherstinova T., Вепринцева Д. А., Человек: образ и сущность. Гуманитарные аспекты 2025 № 2(62) С. 89–108

В статье рассматриваются три разных подхода к изучению тематики повседневных разговоров: экспертная тематическая разметка и два автоматических метода (тематическое моделирование и кластеризация). Материалом для исследования послужили расшифровки русской устной повседневной речи из корпуса ОРД, подготовленные на основе звукозаписей спонтанных разговоров, выполненных в естественных коммуникативных ситуациях (дома, на работе, в учебном заведении, в магазине, в поликлинике ...

Added: September 3, 2025

Institutional Determinants and Emerging Trends in Foreign Market Entry Strategies by Small and Medium Enterprises: A Systematic Literature Review

Sikachev A., Veselova A., Управленец 2026 Vol. 17 No. 1 P. 65–83

As small and medium-sized enterprises (SMEs) strive for expansion beyond their domestic borders, the appeal of international markets is undoubtedly attractive. However, there are often numerous obstacles to this journey, which can be complex for companies without experience in international expansion. This article aims to fill the existing gap in the literature by thoroughly analyzing ...

Added: August 21, 2025

Модификация языковой модели SBERT для выявления ESG-рисков на основе текстовых данных компаний и контрольно-надзорных мероприятий

Buzmakov A. V., Kirpishchikov D., Naidenova I. N. et al., Вестник Санкт-Петербургского университета. Серия 10. Прикладная математика. Информатика. Процессы управления 2025 Т. 21 № 1 С. 75–91

An approach has been developed to identify risks associatedwith companies’ environmentalimpact, social responsibility, and governance quality (Environmental, Social, and Governance - ESG risks) based on textual information about the company. To achieve this, a modification of the SBERT language model is proposed with a clearly defined distance functionfor the embedding space. The model is trained on ...

Added: June 6, 2025