• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Fast Tuning of Topic Models: An Application of Rényi Entropy and Renormalization Theory
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Fast Tuning of Topic Models: An Application of Rényi Entropy and Renormalization Theory

Ch. 5. P. 1–8.
Koltsov S., Ignatenko V., Pashakhin S.

In practice, the critical step in building machine learning models of big data (BD) is costly in terms of time and the computing resources procedure of parameter tuning with a grid search. Due to the size, BD are comparable to mesoscopic physical systems. Hence, methods of statistical physics could be applied to BD. The paper shows that topic modeling demonstrates self-similar behavior under the condition of a varying number of clusters. Such behavior allows using a renormalization technique. The combination of a renormalization procedure with the Rényi entropy approach allows for fast searching of the optimal number of clusters. In this paper, the renormalization procedure is developed for the Latent Dirichlet Allocation (LDA) model with a variational Expectation-Maximization algorithm. The experiments were conducted on two document collections with a known number of clusters in two languages. The paper presents results for three versions of the renormalization procedure: (1) a renormalization with the random merging of clusters, (2) a renormalization based on minimal values of Kullback–Leibler divergence and (3) a renormalization with merging clusters with minimal values of Rényi entropy. The paper shows that the renormalization procedure allows finding the optimal number of topics 26 times faster than grid search without significant loss of quality.

Language: English
Full text
DOI
Text on another site
Keywords: topic modelingRenyi entropyoptimal number of topicsrenormalization theory
Publication based on the results of:
­­­Social networks as a socio-psychological and textual phenomenon (2019)

In book

Proceedings of the 5th International Electronic Conference on Entropy and Its Applications
Vol. 46. Issue 1. , MDPI AG, 2020.
Similar publications
Optimizing Modality Weights in Topic Models of Transactional Data
Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922
Added: November 19, 2025
Institutional Determinants and Emerging Trends in Foreign Market Entry Strategies by Small and Medium Enterprises: A Systematic Literature Review
Sikachev A., Veselova A., Управленец 2026 Vol. 17 No. 1 P. 65–83
As small and medium-sized enterprises (SMEs) strive for expansion beyond their domestic borders, the appeal of international markets is undoubtedly attractive. However, there are often numerous obstacles to this journey, which can be complex for companies without experience in international expansion. This article aims to fill the existing gap in the literature by thoroughly analyzing ...
Added: August 21, 2025
From productivity to wellbeing? Topic modelling of doctoral education research
Smirnov N., Higher Education 2026 Vol. 91 No. 3 P. 993–1021
Doctoral education has undergone significant transformations over the past two decades, driven by massification, internationalization, and the diversification of training models. These shifts have led to a growing body of research on doctoral education, yet little is known about the overarching thematic and geographical trends shaping this field. This study applies computational natural language processing ...
Added: May 26, 2025
Цифровое моделирование тематического поля изучения социального капитала поколений в организациях
Volkova N., Бордунос А. К., Чикер В. А. et al., Социальная психология и общество 2025 Т. 16 № 1 С. 5–27
Objective. Identify key topics presented in contemporary research on the relationship between social capital and generational differences in organizations, utilizing digital processing approaches on a dataset of scientific publications. Background. The emergence of new technologies, labor migration, and the involvement of representatives of different generations in labor activities have highlighted the process of continuous socialization of individuals in ...
Added: May 5, 2025
Войти через госуслуги? Факторы отношения к сервисам электронного правительства в социальных медиа
Егоров В. Ю., Philippov I., Akhremenko A. S., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 1 С. 214–239
The focus of the work is related to the public perception of government practices within the framework of digitalization policy. Electronic practices of interaction with the government have long been widespread among most Russians. This is confirmed by both public opinion polls and Russia’s high positions in the world rankings of e-government development. In this ...
Added: May 1, 2025
Censorship as a Dissociative Force: A Case of Sovremennik Magazine, 1847–1866
Vozhik E., Maslinsky K., Lisiukov R., CEUR Workshop Proceedings 2024 P. 938–949
The article focuses on the systemic effects of censorship that manifest themselves in the content of published materials that successfully passed the censorship filters. We understand censorship as a special kind of collective imagination about the (in)acceptable, inherent in a particular political context and influencing the decision-making logic by different actors. The idea is that ...
Added: April 3, 2025
Using topic modeling for communities clusterization in the VKontakte social network
Gorshkov S., Ilyushin E., Chernysheva A. et al., International Journal of Open Information Technologies 2021 Vol. 9 No. 5 P. 12–17
Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) ...
Added: December 25, 2024
TEXTS OF DIFFERENT EMOTIONAL CLASSES AND THEIR TOPIC MODELING
Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2024 Vol. 23 No. 5 P. 60–71
The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...
Added: November 29, 2024
Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics
Sergei Koltcov, Surkov A., Filippov V. et al., PeerJ Computer Science 2024 Vol. 10 P. 41
Topic modeling is a widely used instrument for the analysis of large text collections. In the last few years, neural topic models and models with word embeddings have been proposed to increase the quality of topic solutions. However, these models were not extensively tested in terms of stability and interpretability. Moreover, the question of selecting the number of topics ...
Added: February 16, 2024
Random forests with parametric entropy-based information gains for classification and regression problems
Ignatenko V., Surkov A., Sergei Koltcov, PeerJ Computer Science 2024 Vol. 10 Article e1775
The random forest algorithm is one of the most popular and commonly used algorithms for classification and regression tasks. It combines the output of multiple decision trees to form a single result. Random forest algorithms demonstrate the highest accuracy on tabular data compared to other algorithms in various applications. However, random forests and, more precisely, decision trees, are usually ...
Added: February 16, 2024
Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе
Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138
In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...
Added: February 11, 2024
О прошлом, но в разное время: компьютерный анализ текстов учебников по истории СССР/России для шести поколений студентов
Kolmogorova A., Колмогорова П. А., Куликова Е. Р., Вестник Томского государственного университета. Филология 2024 № 89 С. 73–103
In this article, we focus on the analysis of the texts of three history textbooks for university students published at different times: in 1946, in 1983 and in 2006. As a material, we use texts devoted in each of the textbooks to seven historical topics since the beginnings of Kiev principality till the Reforms of ...
Added: December 10, 2023
Тематическое моделирование для коротких текстов: сравнительный анализ
Vashchenko V., Социология: методология, методы, математическое моделирование 2023 № 56 С. 69–112
The steady increase in the popularity of social media as a means of communication actualizes methodological issues related to processing of short texts with less semantic context than large corpora, which are widely used for training and testing machine learning models for textual data. Topic modeling, an unsupervised machine learning technique aimed at aggregating texts ...
Added: December 7, 2023
Конструирование образа города в официальной и обыденной коммуникации: сравнительный анализ (на материале социальных медиа)
Matkin N., Коммуникации. Медиа. Дизайн 2025 Т. 10 № 3 С. 89–110
The article offers an analysis and visualization of Russian city images that emerge in the comments of urban community subscribers and posts from administrative press services. The city image is regarded as a frame structure that develops through political and interpersonal communication in the network. The social component of the city image is identified as ...
Added: November 15, 2023
Компьютерное моделирование как инструмент анализа художественного текста
Kolmogorova A., Залевская Е. Д., Филологический класс 2023 Т. 28 № 2 С. 22–33
The article investigates the issue of heuristic productivity of using the method of computer-assisted topic modeling for philological analysis of fiction text. The study analyzes the results of applying the algorithm of Latent Placement Dirichlet (LDA) for searching intertextual connections of motifs in two sub-corpora of fiction texts: 62 texts of different genres (stories, essays, ...
Added: October 31, 2023
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit