?
Interval Semi-Supervised LDA: Classifying Needles in a Haystack
P. 265–274.
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern
further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.
Publication based on the results of:
In book
* I: Advances in Artificial Intelligence and Its Applications. , Berlin: Springer, 2013.
Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922
Added: November 19, 2025
Sikachev A., Veselova A., Управленец 2026 Vol. 17 No. 1 P. 65–83
As small and medium-sized enterprises (SMEs) strive for expansion beyond their domestic borders, the appeal of international markets is undoubtedly attractive. However, there are often numerous obstacles to this journey, which can be complex for companies without experience in international expansion. This article aims to fill the existing gap in the literature by thoroughly analyzing ...
Added: August 21, 2025
I. V. Loginova, A. S. Piekalnits, E. A. Sabidaeva et al., Scientific and Technical Information Processing 2025 Vol. 52 No. 6 P. 738–751
The purpose of this paper is to advance and automate language models for extracting statements related to events and factors from text documents using the designed linguistic marker system. The paper presents the outcomes of text-mining models of events and factors extraction approbation on the example of analytical research in human potential, social sciences and ...
Added: July 18, 2025
Smirnov N., Higher Education 2026 Vol. 91 No. 3 P. 993–1021
Doctoral education has undergone significant transformations over the past two decades, driven by massification, internationalization, and the diversification of training models. These shifts have led to a growing body of research on doctoral education, yet little is known about the overarching thematic and geographical trends shaping this field. This study applies computational natural language processing ...
Added: May 26, 2025
Volkova N., Бордунос А. К., Чикер В. А. et al., Социальная психология и общество 2025 Т. 16 № 1 С. 5–27
Objective. Identify key topics presented in contemporary research on the relationship between social capital and generational differences in organizations, utilizing digital processing approaches on a dataset of scientific publications.
Background. The emergence of new technologies, labor migration, and the involvement of representatives of different generations in labor activities have highlighted the process of continuous socialization of individuals in ...
Added: May 5, 2025
Егоров В. Ю., Philippov I., Akhremenko A. S., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 1 С. 214–239
The focus of the work is related to the public perception of government practices within the framework of digitalization policy. Electronic practices of interaction with the government have long been widespread among most Russians. This is confirmed by both public opinion polls and Russia’s high positions in the world rankings of e-government development. In this ...
Added: May 1, 2025
Vozhik E., Maslinsky K., Lisiukov R., CEUR Workshop Proceedings 2024 P. 938–949
The article focuses on the systemic effects of censorship that manifest themselves in the content of published materials that successfully passed the censorship filters. We understand censorship as a special kind of collective imagination about the (in)acceptable, inherent in a particular political context and influencing the decision-making logic by different actors. The idea is that ...
Added: April 3, 2025
Gorshkov S., Ilyushin E., Chernysheva A. et al., International Journal of Open Information Technologies 2021 Vol. 9 No. 5 P. 12–17
Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) ...
Added: December 25, 2024
Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2024 Vol. 23 No. 5 P. 60–71
The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...
Added: November 29, 2024
Malik M. S., Nawaz A., Jamjoom M. M. et al., Intelligent Data Analysis 2024 Vol. 28 No. 4 P. 1045–1065
Online product reviews (OPR) are a commonly used medium for consumers to communicate their experiences with products during online shopping. Previous studies have investigated the helpfulness of OPRs using frequency-based, linguistic, meta-data, readability, and reviewer attributes. In this study, we explored the impact of robust contextual word embeddings, topic, and language models in predicting the ...
Added: February 26, 2024
Sergei Koltcov, Surkov A., Filippov V. et al., PeerJ Computer Science 2024 Vol. 10 P. 41
Topic modeling is a widely used instrument for the analysis of large text collections.
In the last few years, neural topic models and models with word embeddings have
been proposed to increase the quality of topic solutions. However, these models
were not extensively tested in terms of stability and interpretability. Moreover, the
question of selecting the number of topics ...
Added: February 16, 2024
Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138
In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...
Added: February 11, 2024
Kolmogorova A., Колмогорова П. А., Куликова Е. Р., Вестник Томского государственного университета. Филология 2024 № 89 С. 73–103
In this article, we focus on the analysis of the texts of three history textbooks for university students published at different times: in 1946, in 1983 and in 2006. As a material, we use texts devoted in each of the textbooks to seven historical topics since the beginnings of Kiev principality till the Reforms of ...
Added: December 10, 2023
Vashchenko V., Социология: методология, методы, математическое моделирование 2023 № 56 С. 69–112
The steady increase in the popularity of social media as a means of communication actualizes methodological issues related to processing of short texts with less semantic context than large corpora, which are widely used for training and testing machine learning models for textual data. Topic modeling, an unsupervised machine learning technique aimed at aggregating texts ...
Added: December 7, 2023
Matkin N., Коммуникации. Медиа. Дизайн 2025 Т. 10 № 3 С. 89–110
The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...
Added: November 15, 2023
Kolmogorova A., Залевская Е. Д., Филологический класс 2023 Т. 28 № 2 С. 22–33
The article investigates the issue of heuristic productivity of using the method of computer-assisted topic modeling for philological analysis of fiction text. The study analyzes the results of applying the algorithm of Latent Placement Dirichlet (LDA) for searching intertextual connections of motifs in two sub-corpora of fiction texts: 62 texts of different genres (stories, essays, ...
Added: October 31, 2023