Stable Topic Modeling with Local Density Regularization

Sergei Koltcov; S. I. Nikolenko; Olessia Koltsova; Vladimir Filippov; Svetlana Bodrunova

doi:10.1007/978-3-319-45982-0_16

Publications

?

Stable Topic Modeling with Local Density Regularization

P. 176–188.

Sergei Koltcov, Nikolenko S. I., Olessia Koltsova, Vladimir Filippov, Svetlana Bodrunova

Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based user-generated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.

Keywords: стабильность Gibbs sampling topic modeling LDA stability латентное размещение Дирихле тематическое моделирование алгоритм сэмплирования Гиббса

Publication based on the results of:

Internet use and Internet users: cross-country and cross-regional comparisons (2016)

In book

Internet Science, Proc. of 3d conf INSCI 2016, Lecture Notes in Computer Science series

Vol. 9934. , Switzerland: Springer, 2016.

Topic modelling for qualitative studies

Sergey Nikolenko, Sergei Koltcov, Olessia Koltsova, Journal of Information Science 2017 Vol. 43 No. 1 P. 88–102

Qualitative studies, such as sociological research, opinion analysis and media studies, can benefit greatly from automated topic mining provided by topic models such as latent Dirichlet allocation (LDA). However, examples of qualitative studies that employ topic modelling as a tool are currently few and far between. In this work, we identify two important problems along ...

Added: October 7, 2016

Do topics make a metaphor? Topic modeling for metaphor identification and analysis in Russian.

Badryzlova Y., Nikiforova A., Lyashevskaya O., , in: Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020, Revised Selected PapersVol. 12602.: Springer, 2021. P. 69–81.

The paper examines the efficiency of topic models as features for computational identification and conceptual analysis of linguistic metaphor on Russian data. We train topic models using three algorithms (LDA and ARTM – sparse and dense) and evaluate their quality. We compute topic vectors for sentences of a metaphor-annotated Russian corpus and train several classifiers ...

Added: October 7, 2020

Исследование дискурса о биопедагогике при помощи тематического моделирования и синтаксического анализа текстов

Nagornyy O. S., Мухетдинова А. Т., В кн.: Математическое и компьютерное моделирование [Электронный ресурс]: материалы IV Международной научной конференции (Омск, 11 ноября 2016 г.).: Омск: Издательство Омского государственного университета, 2016. С. 154–156.

В данной работе на материалах раздела о здоровом образе жизни блога lifehacker.ru при помощи тематического моделирования и синтаксического анализа текстов исследуется, как дискурс о биопедагогике проявляет себя в Интернете, какие лингвистические средства для этого используются и какие темы затрагиваются. ...

Added: November 25, 2016

Stable topic modeling for web science: Granulated LDA

Koltsov S., Nikolenko S. I., Koltsova O. et al., , in: WebSci 2016 - Proceedings of the 2016 ACM Web Science Conference.: Elsevier, 2016. P. 342–343.

Topic modeling is a powerful tool for analyzing large collections of user-generated web content, but it still suffers from problems with topic stability, which are especially important for social sciences. We evaluate stability for differenttopic models and propose a new model, granulated LDA,that samples short sequences of neighboring words at once. We show that gLDA ...

Added: October 24, 2016

Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе

Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138

In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...

Added: February 11, 2024

Что скрывает русский рэп? Тематическое моделирование текстов русскоязычной хип-хоп сцены

Бойченко А. Е., Zhuchkova S., Журнал социологии и социальной антропологии 2020 Т. 23 № 2 С. 130–165

Th e study presents an attempt of the complex exploratory analysis of Russian rap based on the corpus of texts of the Russian-language songs of this genre. Th e corpus contains more than 11,000 texts that vary in their date of creation and popularity by more than 500 artists collected by automatically extracting data from web ...

Added: August 12, 2020

Academic Macroeconomics and Monetary Policy: Topic Modeling Based on Transcripts of the Meetings of the Federal Open Market Committee from 1976 to 2016

Bakeev M., OEconomia 2023

This paper explores the place of academic macroeconomics discourse in the discussions of monetary policy makers, using the transcripts of meetings of the Federal Open Market Committee (FOMC) from 1976 to 2016. Latent Dirichlet Allocation (LDA) is used to separate the transcripts into topics of discussion. The paper shows that policy makers with a PhD ...

Added: May 13, 2023

Феномен внимания в информационной среде: экономика внимания

Милкова М. А., Цифровая экономика 2020 № 3 С. 73–87

The modern economy revolves more and more around the concentration of human attention, which means that the principles of attention management are the determining factor in the functioning of such an economy. Attention regulates how people interact with the world, both individually and socially. In addition, attracting attention and then reselling it is now a ...

Added: June 29, 2023

Changes in the Topical Structure of Russian-Language Livejournal: The Impact of Elections 2011

Maslinsky K. A., Koltsov S., Koltsova O., / NRU Higher School of Economics. Series SOC "Sociology". 2013. No. 14.

This study investigates the topical structure of the Russian-language blog-publishing service LiveJournal and the change in it that occurred in the course of the public activity after the State Duma elections in December 2011 as compared to a previous “control” period (November 27 – December 27 and August 15 – September 15 respectively). The data ...

Added: February 1, 2013

Topic Modeling of Literary Texts Using LDA: On the Influence of Linguistic Preprocessing on Model Interpretability

Sherstinova T., Moskvina A., Kirina M. et al., , in: 2022 31st Conference of Open Innovations Association (FRUCT)Vol. 32.: IEEE, 2022. P. 305–312.

The article describes the results of the research, the purpose of which was to evaluate the influence of linguistic preprocessing on the interpretability of topic models for literary texts. The study was carried out as part of a large project aimed to obtain topic models of Russian short stories written in the first three decades ...

Added: October 31, 2022

О прошлом, но в разное время: компьютерный анализ текстов учебников по истории СССР/России для шести поколений студентов

Kolmogorova A., Колмогорова П. А., Куликова Е. Р., Вестник Томского государственного университета. Филология 2024 № 89 С. 73–103

In this article, we focus on the analysis of the texts of three history textbooks for university students published at different times: in 1946, in 1983 and in 2006. As a material, we use texts devoted in each of the textbooks to seven historical topics since the beginnings of Kiev principality till the Reforms of ...

Added: December 10, 2023

Mapping the Public Agenda with Topic Modeling:The Case of the Russian LiveJournal

Koltsova O., Sergei Koltcov, Policy & Internet 2013 Vol. 5 No. 2 P. 207–227

The purpose of this research is to describe the agenda set by the Internet-active part of the Russian public in Russia’s leading blog platform LiveJournal. This is done through modelling the Livejournal’s topic structure viewed as a reflection of online public opinion. Topic modelling is performed automatically with a LDA algorithm, and complemented with hand ...

Added: December 11, 2012

Latent Dirichlet Allocation: Stability and Applications to Studies of User-Generated content

Koltsov S., Koltsova O., Nikolenko S. I., , in: Proceedings of WebSci '14 ACM Web Science Conference, Bloomington, IN, USA — June 23 - 26, 2014.: NY: ACM, 2014. P. 161–165.

Topic modeling, in particular the Latent Dirichlet Allocation (LDA) model, has recently emerged as an important tool for understanding large datasets, in particular, user-generated datasets in social studies of the Web. In this work, we investigate the instability of LDA inference, propose a new metric of similarity between topics and a criterion of vocabulary reduction. ...

Added: October 17, 2014

КОНСТРУИРОВАНИЕ ОБРАЗА ГОРОДА В ОФИЦИАЛЬНОЙ И ОБЫДЕННОЙ КОММУНИКАЦИИ: СРАВНИТЕЛЬНЫЙ АНАЛИЗ (НА МАТЕРИАЛЕ СОЦИАЛЬНЫХ МЕДИА)

Matkin N., Коммуникации. Медиа. Дизайн 2024

The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...

Added: November 15, 2023

ИНЖЕНЕРНЫЕ ЛИНГВИСТИЧЕСКИЕ ТЕХНОЛОГИИ В ИССЛЕДОВАНИИ ТЕКСТА

Kolmogorova A., Terra Linguistica 2023 Т. 14 № 1 С. 7–10

The publication is devoted to the analysis of the current state of engineering linguistics, its main directions and research challenges. The definition of language technologies and their typology are formulated according to the criterion of the tasks solved with their help. It is noted that the national school of engineering linguistics manages to maintain a ...

Added: October 31, 2023

TEXTS OF DIFFERENT EMOTIONAL CLASSES AND THEIR TOPIC MODELING

Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2024 Vol. 23 No. 5 P. 60–71

The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...

Added: November 29, 2024

Internet Regulation: A Text-based Approach to Media Coverage

Shirokanova A., Silyutina O., , in: Digital Transformation and Global Society Third International Conference, DTGS 2018, St. Petersburg, Russia, May 30 –June 2, 2018, Revised Selected Papers, Part IIssue 858.: Cham: Springer, 2018. P. 181–194.

Internet regulation in Russia has vigorously expanded in recent years to transform the relatively free communication environment of the 2000s into a heavily regulated one. Our goal was to identify the topic structure of Russian media discourse on Internet regulation and compare it between political and non-political media outlets. We used structural topic modeling on ...

Added: October 10, 2018

Тематическое моделирование для коротких текстов: сравнительный анализ

Vashchenko V., Социология: методология, методы, математическое моделирование 2023 № 56 С. 69–112

The steady increase in the popularity of social media as a means of communication actualizes methodological issues related to processing of short texts with less semantic context than large corpora, which are widely used for training and testing machine learning models for textual data. Topic modeling, an unsupervised machine learning technique aimed at aggregating texts ...

Added: December 7, 2023

Digital Inequality in Russia through the Use of a Social Network Site: A Cross-Regional Comparison

Rykov Y., Nagornyy O. S., Koltsova O., , in: Digital Transformation & Global Society: Second International Conference, DTGS 2017, St. Petersburg, Russia, June 21-23, 2017, Revised Selected Papers.: Springer, 2017. P. 70–83.

An important role of digital inequality for hindering the development of civil society is being increasingly acknowledged. Simultaneously, differences in availability and the practices of use of social network sites (SNS) may be considered as major manifestations of such digital divide. While SNS are in principle highly convenient spaces for public discussion, lack of access ...

Added: October 23, 2017

Модификации EM-алгоритма для вероятностного тематического моделирования

Vorontsov K. V., Potapenko A., Машинное обучение и анализ данных 2013 Т. 1 № 6 С. 657–686

Probabilistic topic models discover a low-dimensional interpretable representation of text corpora by estimating a multinomial distribution over topics for each document and a multinomial distribution over terms for each topic. A unied family of expectation-maximization (EM) like algorithms with smoothing, sampling, sparsing, and robustness heuristics that can be used in any combinations is considered. The ...

Added: February 19, 2015

Investigation of the Sensitivity and Stability of Transfer Characteristic of Electromechanical Measuring Transducer of Small Values of Velocity Head of Rarefied Gas

Grachev N. N., S.N.Safonov, , in: 2018 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM).: IEEE, 2018. P. 1–4.

When measuring small, slowly changing physical parameters, such as parameters of the velocity of low-density gas flows, small values of aerodynamic forces, aerodynamic forces acting on aircraft in conditions of a rarefied gas environment, electrical signals coming from primary converters often have a low level, sometimes reaching very small values, is much less than the ...

Added: September 6, 2019

Studying Patterns of Communication in Virtual Urban Groups With Different Modes of Privacy

Vadim Voskresenskiy, Musabirov I., Alexandrov D. A., / NRU Higher School of Economics. Series SOC "Sociology". 2017.

This paper is concerned with online communication of apartment buildings' residents on general purpose social networking site (SNS) VKontakte (VK), focusing on how groups' participants use instruments of SNS to separate place-based discussions and participation in wider community initiatives. With the help of topic modeling algorithm LDA, we analyzed posts collected from online groups related ...

Added: October 20, 2017

Интеллектуальный анализ текстов в социальных науках

Byzov A., Социология: методология, методы, математическое моделирование 2019 № 49 С. 131–160

Throughout most of their history, sociologists have sought to study unstructured organic texts: newspaper materials, diaries, memoirs, letters, documents, and, more recently, messages, publications and other texts on various online platforms. This article discusses how modern techniques of text mining can improve classical sociological approaches to the analysis of this type of data. The article ...

Added: December 9, 2019

Using topic modeling for communities clusterization in the VKontakte social network

Gorshkov S., Ilyushin E., Chernysheva A. et al., International Journal of Open Information Technologies 2021 Vol. 9 No. 5 P. 12–17

Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) ...

Added: December 25, 2024