Interval Semi-supervised LDA: Classifying Needles in a Haystack

A
A
A

АБВ
АБВ
АБВ

A
A
A
A
A

Обычная версия сайта

RU
EN

HSE University
Publications
Book chapter
Interval Semi-supervised LDA: Classifying Needles in a Haystack

RU
EN

Расширенный поиск

Высшая школа экономики

Национальный исследовательский университет

Priority areas

business informatics
economics
engineering science
humanitarian
IT and mathematics
law
management
mathematics
sociology
state and public administration

by year

Subject

January 29, 2026

HSE Scientists Uncover How Authoritativeness Shapes Trust

Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.

January 28, 2026

Language Mapping in the Operating Room: HSE Neurolinguists Assist Surgeons in Complex Brain Surgery

Researchers from the HSE Center for Language and Brain took part in brain surgery on a patient who had been seriously wounded in the SMO. A shell fragment approximately five centimetres long entered through the eye socket, penetrated the cranial cavity, and became lodged in the brain, piercing the temporal lobe responsible for language. Surgeons at the Burdenko Main Military Clinical Hospital removed the foreign object while the patient remained conscious. During the operation, neurolinguists conducted language tests to ensure that language function was preserved.

January 28, 2026

AI Overestimates How Smart People Are, According to HSE Economists

Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Books
Articles
Chapters of books
Working papers

Report a publication
Research at HSE

?

Interval Semi-supervised LDA: Classifying Needles in a Haystack

P. 265–274.

Bodrunova S., Koltsov S., Koltsova O., Nikolenko S. I., Shimorina A.

An important text mining problem is to find, in a large collection of texts, documents related to specic topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to nd the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predened sets of keywords (that dene the topics researchers are interested in) are restricted to specic intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.

Language: English

Text on another site

Keywords: text mining topic modeling latent Dirichlet allocation

Publication based on the results of:

Social and Political Processes Online (2013)

In book

Proceedings of the 12th Mexican International Conference on Artificial Intelligence (MICAI 2013)

* I: Advances in Artificial Intelligence and Its Applications. , Berlin: Springer, 2013.

Similar publications

Optimizing Modality Weights in Topic Models of Transactional Data

Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922

Added: November 19, 2025

From productivity to wellbeing? Topic modelling of doctoral education research

Smirnov N., Higher Education 2025

Doctoral education has undergone significant transformations over the past two decades, driven by massification, internationalization, and the diversification of training models. These shifts have led to a growing body of research on doctoral education, yet little is known about the overarching thematic and geographical trends shaping this field. This study applies computational natural language processing ...

Added: May 26, 2025

Цифровое моделирование тематического поля изучения социального капитала поколений в организациях

Volkova N., Бордунос А. К., Чикер В. А. et al., Социальная психология и общество 2025 Т. 16 № 1 С. 5–27

Objective. Identify key topics presented in contemporary research on the relationship between social capital and generational differences in organizations, utilizing digital processing approaches on a dataset of scientific publications. Background. The emergence of new technologies, labor migration, and the involvement of representatives of different generations in labor activities have highlighted the process of continuous socialization of individuals in ...

Added: May 5, 2025

Войти через госуслуги? Факторы отношения к сервисам электронного правительства в социальных медиа

Егоров В. Ю., Philippov I., Akhremenko A. S., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 1 С. 214–239

The focus of the work is related to the public perception of government practices within the framework of digitalization policy. Electronic practices of interaction with the government have long been widespread among most Russians. This is confirmed by both public opinion polls and Russia’s high positions in the world rankings of e-government development. In this ...

Added: May 1, 2025

Censorship as a Dissociative Force: A Case of Sovremennik Magazine, 1847–1866

Vozhik E., Maslinsky K., Lisiukov R., CEUR Workshop Proceedings 2024 P. 938–949

The article focuses on the systemic effects of censorship that manifest themselves in the content of published materials that successfully passed the censorship filters. We understand censorship as a special kind of collective imagination about the (in)acceptable, inherent in a particular political context and influencing the decision-making logic by different actors. The idea is that ...

Added: April 3, 2025

Using topic modeling for communities clusterization in the VKontakte social network

Gorshkov S., Ilyushin E., Chernysheva A. et al., International Journal of Open Information Technologies 2021 Vol. 9 No. 5 P. 12–17

Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) ...

Added: December 25, 2024

TEXTS OF DIFFERENT EMOTIONAL CLASSES AND THEIR TOPIC MODELING

Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2024 Vol. 23 No. 5 P. 60–71

The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...

Added: November 29, 2024

Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics

Sergei Koltcov, Surkov A., Filippov V. et al., PeerJ Computer Science 2024 Vol. 10 P. 41

Topic modeling is a widely used instrument for the analysis of large text collections. In the last few years, neural topic models and models with word embeddings have been proposed to increase the quality of topic solutions. However, these models were not extensively tested in terms of stability and interpretability. Moreover, the question of selecting the number of topics ...

Added: February 16, 2024

Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе

Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138

In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...

Added: February 11, 2024

О прошлом, но в разное время: компьютерный анализ текстов учебников по истории СССР/России для шести поколений студентов

Kolmogorova A., Колмогорова П. А., Куликова Е. Р., Вестник Томского государственного университета. Филология 2024 № 89 С. 73–103

In this article, we focus on the analysis of the texts of three history textbooks for university students published at different times: in 1946, in 1983 and in 2006. As a material, we use texts devoted in each of the textbooks to seven historical topics since the beginnings of Kiev principality till the Reforms of ...

Added: December 10, 2023

Тематическое моделирование для коротких текстов: сравнительный анализ

Vashchenko V., Социология: методология, методы, математическое моделирование 2023 № 56 С. 69–112

The steady increase in the popularity of social media as a means of communication actualizes methodological issues related to processing of short texts with less semantic context than large corpora, which are widely used for training and testing machine learning models for textual data. Topic modeling, an unsupervised machine learning technique aimed at aggregating texts ...

Added: December 7, 2023

Конструирование образа города в официальной и обыденной коммуникации: сравнительный анализ (на материале социальных медиа)

Matkin N., Коммуникации. Медиа. Дизайн 2025 Т. 10 № 3 С. 89–110

The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...

Added: November 15, 2023

Компьютерное моделирование как инструмент анализа художественного текста

Kolmogorova A., Залевская Е. Д., Филологический класс 2023 Т. 28 № 2 С. 22–33

The article investigates the issue of heuristic productivity of using the method of computer-assisted topic modeling for philological analysis of fiction text. The study analyzes the results of applying the algorithm of Latent Placement Dirichlet (LDA) for searching intertextual connections of motifs in two sub-corpora of fiction texts: 62 texts of different genres (stories, essays, ...

Added: October 31, 2023

ИНЖЕНЕРНЫЕ ЛИНГВИСТИЧЕСКИЕ ТЕХНОЛОГИИ В ИССЛЕДОВАНИИ ТЕКСТА

Kolmogorova A., Terra Linguistica 2023 Т. 14 № 1 С. 7–10

The publication is devoted to the analysis of the current state of engineering linguistics, its main directions and research challenges. The definition of language technologies and their typology are formulated according to the criterion of the tasks solved with their help. It is noted that the national school of engineering linguistics manages to maintain a ...

Added: October 31, 2023

Литературное наследие XIX–XX веков: классификация растровых изображений для интеллектуального анализа и тематического моделирования корпуса рукописных текстов

Penskaja E., Khachaturyan L., Филологические науки. Научные доклады высшей школы 2023 № 5 С. 160–165

The article examines the current trends in workingwith digital forms of handwritten heritage on the history of Russian literature of the second half of the 19 — mid-20 century. The process of forming virtual archives is analyzed as a gradual accumulation of the “big date” of scientific research — an unrecognized information array of raster ...

Added: October 30, 2023

NLP methods for automatic candidate’s CV segmentation

Tikhonova M., Gavrishchuk A., , in: 2019 International Conference on Engineering and Telecommunication (EnT).: IEEE, 2019. P. 1–5.

The problem of CV (or resume) segmentation and automatic extraction becomes increasingly relevant nowadays as long as it could simplify candidate selection process. The paper proposes a new method of automatic CV segmentation and parsing. The described algorithm is based on Natural Language Processing and Machine Learning methods. The proposed procedure allows to extract information ...

Added: September 22, 2023

About
About
Key Figures & Facts
Sustainability at HSE University
Faculties & Departments
International Partnerships
Faculty & Staff
HSE Buildings
HSE University for Persons with Disabilities
Public Enquiries

Studies
Admissions
Programme Catalogue
Undergraduate
Graduate
Exchange Programmes
Summer University
Summer Schools
Semester in Moscow
Business Internship

Research
International Laboratories
Research Centres
Research Projects
Monitoring Studies
Conferences & Seminars
Academic Jobs
Yasin (April) International Academic Conference on Economic and Social Development

Media & Resources
Publications by staff
HSE Journals
Publishing House
iq.hse.ru: commentary by HSE experts
Library
Economic & Social Data Archive
Video
HSE Repository of Socio-Economic Information

HSE1993–2026
Contacts
Copyright
Privacy Policy
Site Map