• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Fractal approach for determining the optimal number of topics in the field of topic modeling
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
April 30, 2026
HSE Researchers Compile Scientific Database for Studying Childrens Eating Habits
The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.
April 30, 2026
New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind
A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.
April 28, 2026
Scientists Develop Algorithm for Accurate Financial Time Series Forecasting
Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Fractal approach for determining the optimal number of topics in the field of topic modeling

Journal of Physics: Conference Series. 2019. Vol. 1163. No. 1. P. 1–6.
Ignatenko V., Sergei Koltcov, Staab S., Boukhers Z.

In the framework of this paper we apply multifractal formalism to the analysis of
statistical behaviour of topic models under variation of the number of topics. Fractal analysis
of topic models allows to show that self-similar fractal clusters exist in large textual collections.
We provide numerical results for 3 topic models (PLSA, ARTM, LDA Gibbs sampling) on
2 datasets, namely, on an English-language dataset and on a Russian-language dataset. We
demonstrate that forming of clusters occurs precisely in the transition regions. Linear regions
do not lead to changes in fractals, therefore, it is sufficient to find transition regions for the
study of textual collections. Accordingly, the problem of the analysing the evolution of topic
models can be reduced to the problem of searching transition regions in topic models.

Priority areas: IT and mathematics
Language: English
Full text
DOI
Text on another site
Keywords: topic modelingmultifractal formalismвероятностное тематическое моделированиеoptimal number of topicsмультифрактальный подходоптимальное число тем
Publication based on the results of:
Social and textual measurement of social network user profiles (2018)
Similar publications
Natural hazard database from Internet publications: text mining with a large language model
Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.
Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...
Added: April 28, 2026
Ising models on the hydrogen peroxide and other lattices
Qin X., Deng Y., Shchur L. et al., / Series arXiv "math". 2026. No. 2603.02962.
We perform a Monte Carlo analysis of the Ising model on many three-dimensional lattices. By means of finite-size scaling we obtain the critical points and determine the scaling dimensions. As expected, the critical exponents agree with the three-dimensional Ising universality class for all models. The irrelevant field, as revealed by the correction-to-scaling amplitudes, appears to ...
Added: April 20, 2026
Algorithmic overlaps as thermodynamic variables: from local to cluster Monte Carlo dynamics in critical phenomena
Pilé I., Deng Y., Shchur L., / Series arXiv "math". 2026. No. 2604.10254.
We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Svendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for two models in different universality classes: the Ising model and the Potts model with three components. The overlap of two ...
Added: April 20, 2026
Using predefined vector systems to speed up neural network multimillion class classification
Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.
Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...
Added: April 2, 2026
Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection
Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.
Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...
Added: January 15, 2026
Implementing Transport Coding in OMNeT++ for Message Delay Reduction
Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.
Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer:  original packets are encoded into  coded packets, and the message is reconstructed after the first  successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...
Added: December 24, 2025
Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset
Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.
Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...
Added: December 1, 2025
Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning
Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.
We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...
Added: November 21, 2025
Optimizing Modality Weights in Topic Models of Transactional Data
Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922
Added: November 19, 2025
Эффективный алгоритм торговли на фондовом рынке: ретроспективный анализ, основанный на данных по S&P-500.
Rubchinskiy A., Chubarova D., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2025. No. WP7/2025/01.
The article examines one of the most famous examples of socio-economic systems, characterized by significant uncertainty – the S&P-500 stock market, where shares of 500 largest US companies are traded. No assumptions are made about the probabilistic characteristics of the stock market. A flexible algorithm for daily trading has been developed, based on both known fixed data ...
Added: November 9, 2025
Diffusion on language model embeddings for protein sequence generation
Meshchaninov V., Strashnov, P., Shevtsov A. et al., / Cornell University. Серия CoRR, arXiv:2403.03726 "Computing Research Repository,". 2025.
Protein design requires a deep understanding of the inherent complexities of the protein universe. While many efforts lean towards conditional generation or focus on specific families of proteins, the foundational task of unconditional generation remains underexplored and undervalued. Here, we explore this pivotal domain, introducing DiMA, a model that leverages continuous diffusion on embeddings derived ...
Added: October 5, 2025
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
Shabalin A., Meshchaninov V., Vetrov D., / Series cs.CL, arXiv:2505.18853 "Computation and Language". 2025.
Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic ...
Added: October 5, 2025
A Feature Engineering Framework for Computer Vision Based on Topological Data Analysis
Абрамов А. С., Chernyshev V. L., Mikhaylets E. et al., / Series Social Science Research Network "Social Science Research Network". 2025.
Computer vision is one of the most relevant modern research areas with broad practical applications. However, traditional solutions based on deep learning have signicant limitations and can be misleading. Topological data analysis, on the other hand, is a modern approach to solving similar problems using mathematically deterministic methods of algebraic topology that reduce the risk ...
Added: September 23, 2025
On the construction of frieze patterns from partitions of convex polygons by nonintersecting diagonals
Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 07600.
We demonstrate in an elementary way how to construct a frieze pattern of width m-3 from a partition of a convex m-gon by not intersecting diagonals. ...
Added: September 17, 2025
On one property of Catalan numbers
Kochetkov Y., / Series arXiv.org e-print archive "arXiv.math". 2025. No. 20584.
We give a new proof of the following statement: the Catalan number C_n is divisible by n+2, if n is odd and n<> 3k+1. ...
Added: September 9, 2025
Low Sets and Closure Properties of Counting Function Classes
Ivanashev Y., / Series Computer Science "arxiv.org". 2025.
Added: July 29, 2025
From productivity to wellbeing? Topic modelling of doctoral education research
Smirnov N., Higher Education 2026 Vol. 91 No. 3 P. 993–1021
Doctoral education has undergone significant transformations over the past two decades, driven by massification, internationalization, and the diversification of training models. These shifts have led to a growing body of research on doctoral education, yet little is known about the overarching thematic and geographical trends shaping this field. This study applies computational natural language processing ...
Added: May 26, 2025
Цифровое моделирование тематического поля изучения социального капитала поколений в организациях
Volkova N., Бордунос А. К., Чикер В. А. et al., Социальная психология и общество 2025 Т. 16 № 1 С. 5–27
Objective. Identify key topics presented in contemporary research on the relationship between social capital and generational differences in organizations, utilizing digital processing approaches on a dataset of scientific publications. Background. The emergence of new technologies, labor migration, and the involvement of representatives of different generations in labor activities have highlighted the process of continuous socialization of individuals in ...
Added: May 5, 2025
Войти через госуслуги? Факторы отношения к сервисам электронного правительства в социальных медиа
Егоров В. Ю., Philippov I., Akhremenko A. S., Мониторинг общественного мнения: Экономические и социальные перемены 2025 № 1 С. 214–239
The focus of the work is related to the public perception of government practices within the framework of digitalization policy. Electronic practices of interaction with the government have long been widespread among most Russians. This is confirmed by both public opinion polls and Russia’s high positions in the world rankings of e-government development. In this ...
Added: May 1, 2025
Censorship as a Dissociative Force: A Case of Sovremennik Magazine, 1847–1866
Vozhik E., Maslinsky K., Lisiukov R., CEUR Workshop Proceedings 2024 P. 938–949
The article focuses on the systemic effects of censorship that manifest themselves in the content of published materials that successfully passed the censorship filters. We understand censorship as a special kind of collective imagination about the (in)acceptable, inherent in a particular political context and influencing the decision-making logic by different actors. The idea is that ...
Added: April 3, 2025
Using topic modeling for communities clusterization in the VKontakte social network
Gorshkov S., Ilyushin E., Chernysheva A. et al., International Journal of Open Information Technologies 2021 Vol. 9 No. 5 P. 12–17
Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) ...
Added: December 25, 2024
TEXTS OF DIFFERENT EMOTIONAL CLASSES AND THEIR TOPIC MODELING
Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2024 Vol. 23 No. 5 P. 60–71
The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...
Added: November 29, 2024
Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics
Sergei Koltcov, Surkov A., Filippov V. et al., PeerJ Computer Science 2024 Vol. 10 P. 41
Topic modeling is a widely used instrument for the analysis of large text collections. In the last few years, neural topic models and models with word embeddings have been proposed to increase the quality of topic solutions. However, these models were not extensively tested in terms of stability and interpretability. Moreover, the question of selecting the number of topics ...
Added: February 16, 2024
Сила и слабость: динамика репрезентации гегемонной маскулинности в русскоязычном рэпе
Zhuchkova S., Бойченко А. Е., Smirnov N., Журнал социологии и социальной антропологии 2024 Т. 27 № 1 С. 103–138
In public and academic debate, rap is often presented as one of the most aggressive music genres, depicting violence and cruelty in various ways. One of the reasons for that is rap’s social background. It emerged in the criminal area of New York first created by the deprived Black population. Using the notion of hegemonic ...
Added: February 11, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit