Analysis and tuning of hierarchical topic models based on Renyi entropy approach

S. Koltsov; V. Ignatenko; M. Terpilowski; Rosso P.

doi:10.7717/peerj-cs.608

Publications

?

Analysis and tuning of hierarchical topic models based on Renyi entropy approach

PeerJ Computer Science. 2021. Vol. 7. Article e608.

Koltsov S., Ignatenko V., Terpilowski M., Rosso P.

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.

Research target: Computer Science

Priority areas: IT and mathematics

Language: English

DOI

Text on another site

Keywords: topic modeling Renyi entropy optimal number of topics Hierarchical topic models

Publication based on the results of:

Modeling the structure and socio-psychological factors of news perception (2022)

Mathematical methods of reinforcement learning

Belomestny D., Gasnikov A., Gladin E. et al., Russian Mathematical Surveys 2026 Vol. 81 No. 4(490) P. 3–90

Reinforcement learning (RL) is increasingly grounded in tools from probability, optimization, and operator theory. This survey organizes the mathematical structures that underpin the design and analysis of modern algorithms in RL. We begin from Markov decision processes (MDPs) and the Bellman operators, emphasizing contraction mappings, monotonicity, and fixed-point theory that yield convergence guarantees and rates ...

Added: August 3, 2026

Чеповский А.М. Анализ корпусов текстов на естественных языках. Математические методы. Учебное пособие – М.: Мастерская Печати Идей, 2026. – 274 с.: илл.

Chepovskiy A., Мастерская Печати Идей, 2026.

The textbook presents methods and algoгithms for automatic analysis of соrроrа of texts in natural languages. It is intended fоr sfudenБ of methods of processing texts in паtчrаl languages and creating training arays of texts. Fоr students, graduate students and researchers studying methods of computational linguistics and word processing. ...

Added: August 1, 2026

Three Algorithms for Merging Hierarchical Navigable Small World Graphs

Ponomarenko A., / Series Computer Science "arxiv.org". 2025.

This paper addresses the challenge of merging hierarchical navigable small world (HNSW) graphs, a critical operation for distributed systems, incremental indexing, and database compaction. We propose three algorithms for this task: Naive Graph Merge (NGM), Intra Graph Traversal Merge (IGTM), and Cross Graph Traversal Merge (CGTM). These algorithms differ in their approach to vertex selection ...

Added: July 30, 2026

Профессиональная верификация: Руководство по продвинутой функциональной верификации

Уилкокс П., Romanov A., М.: ДМК Пресс, 2025.

Книга, которую вы держите в руках, продолжает серию «Книжная полка истового инженера», которая издается при поддержке компании YADRO. Данная книга представляет собой учебник по теоретическим основам продвинутой функциональной верификации и содержит лучшие практики, используемые в настоящее время. В ней подробно описана унифицированная методология верификации (UVM) и раскрыты такие темы, как функциональный виртуальный прототип, функциональное покрытие, утверждения, формальная верификация, тестбенчи, косимуляция, эмуляция, аппаратное ...

Added: July 30, 2026

EEG evidence for reproducible neural states during Buddhist Highest Yoga Tantra meditation

Mikhaylets E. V., Razorenova A. М., Chernyshev V. L. et al., Scientific Reports 2026 Vol. 16 Article 23560

Meditation offers a naturalistic paradigm for studying introspection, yet the neural dynamics of advanced tantric practices remain largely unexplored. Buddhist Highest Yoga Tantra (BHYT) comprises a sequence of eight dissolution stages culminating in the “clear light” state. We recorded EEG during eyes-closed BHYT meditation performed in monasteries and hermitages (51 sessions from 36 male practitioners; ...

Added: July 29, 2026

Machine Learning-based Adaptive Reconstruction of Video Stream Fragments Taking into Account Scene Dynamics. Proceedings of the Institute for System Programming of the RAS

Думкин Н. А., Alexandrov D., Прозорский М. А., Труды Института системного программирования РАН 2026 Т. 38 № 1 С. 255–274

A theoretically sound approach to adaptive client-side video fragment restoration is proposed using machine learning and scene analysis methods. The method includes a formal problem statement, a finite-state machine model for decision making, a restoration cost function, and a new stage in video preparation: scene dynamics assessment followed by recording a feature in an HLS playlist. This feature ...

Added: July 27, 2026

Automated Reasoning: 13th International Joint Conference, IJCAR 2026, Lisbon, Portugal, July 26–29, 2026, Proceedings, Part II. (LNCS, volume 16689)

Cham: Springer, 2026.

This open access set, LNAI 16688-16689, constitutes the proceedings of the 13th International Joint Conference, IJCAR 2026, held in Lisbon, Portugal, during July 26–29, 2026. The 41 full research papers and 8 short papers included in these two volumes were carefully reviewed and selected from 112 submissions. The papers cover the following topical sections: Part I: Theorem ...

Added: July 26, 2026

Local Fault-Tolerant Routing in 3D Mesh NoCs using Single-Hop Rollback

Edward R. Rzaev, Aleksandr Y. Romanov, Andrey M. Sukhov, IEEE Access 2026 Vol. 14 P. 2169–3536

This work presents a hierarchy of strictly local fault-tolerant routing algorithms for 3D mesh networks-on-chip, culminating in an algorithm that combines a live-neighbor selection rule with a bounded single-hop rollback mechanism. The proposed algorithms operate exclusively on immediate neighbor information, maintain O(1) per hop complexity, and require no global topology knowledge, additional virtual channels, or ...

Added: July 23, 2026

Библиометрия фольклора: русские пословицы в научных журналах

Pislyakov V., Вестник Томского государственного университета. Филология 2026 № 101 С. 175–192

This article examines the use of proverbs in academic texts—specifically, articles published in Russian research journals. For the experiment, ten proverbs were selected as the intersection of two fundamentally different paremiological surveys aimed at compiling lists of popular or common Russian proverbs. One of these surveys was conducted by the classic of paremiology, G.L. Permyakov, ...

Added: July 22, 2026

SIGIR '26: Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval

Association for Computing Machinery (ACM), 2026.

Wominjeka, and welcome to the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2026), held in Melbourne | Naarm, Australia, from 20–24 July 2026. SIGIR 2026 takes place on the unceded lands of the Woi Wurrung and Boon Wurrung language groups of the eastern Kulin nation, and we pay our ...

Added: July 22, 2026

Long-range machine-learning potentials with environment-dependent charges enable predicting LO-TO splitting and dielectric constants

Korogod D., Shapeev A., Ivan S. Novikov, Physical Review B: Condensed Matter and Materials Physics 2026 Vol. 114 No. 2 Article 024104

We present two models with explicit long-range electrostatics in the form of Coulomb interactions. Both models include point charges depending on their local atomic environments, and the second model also conserves a total charge of an atomic system. We combine the proposed long-range models with the local moment tensor potential (MTP) and demonstrate that they ...

Added: July 22, 2026

Global optimization of atomic clusters via physically constrained tensor train decomposition

Sozykin K., Rybin N., Chertkov A. et al., Physical Review B: Condensed Matter and Materials Physics 2026 Vol. 113 No. 22 Article 224111

The global optimization of atomic clusters represents a fundamental challenge in computational chemistry and materials science due to the exponential growth of local minima with system size (i.e., the curse of dimensionality). We introduce a framework that overcomes this limitation by exploiting the low-rank structure of potential energy surfaces through tensor train (TT) decomposition. Our ...

Added: July 22, 2026

WSI-GT: Pseudo-Label Guided Graph Transformer for Whole-Slide Histology

Михайлов И. А., Machine Learning and Knowledge Extraction 2026 Vol. 8 No. 1 Article 8

Whole-slide histology images (WSIs) can exceed 100 k × 100 k pixels, making direct pixel-level segmentation infeasible and requiring patch-level classification as a practical alternative for downstream WSI segmentation. However, most approaches either treat patches independently, ignoring spatial and biological context, or rely on deep graph models prone to oversmoothing and loss of local tissue ...

Added: July 16, 2026

On the construction of Barnes–Wall lattices and their application in cryptography

Kuninets A., Malygina E., Leevik A. G. et al., Journal of Computer Virology and Hacking Techniques 2026 No. 22 Article 62

In this work, we investigate the application of Barnes–Wall lattices in post-quantum cryptographic schemes. We survey and analyze several constructions of Barnes–Wall lattices, including subgroup chains, the generalized k-ing construction, and connections with Reed-Muller codes, highlighting their equivalence over both Z[i] and Z. Building on these structural insights, we introduce a new algorithm for efficient ...

Added: July 16, 2026

Tencent и Open Source. Как относится к открытому ПО самый дорогой бренд Китая?

Silakov D., Системный администратор 2026 № 5 С. 46–51

В предыдущей статье про Open Source в КНР [1] мы рассказали про Alibaba – крупную корпорацию, занимающую тридцатое место в рейтинге самых значимых мировых брэндов за 2025 год [2]. Место почетное, но не первое среди китайских компаний – на тринадцатом месте расположилась Tencent, разработчик WeChat и ряда других продуктов, широко используемых нашими восточными соседями. Tencent ...

Added: July 14, 2026

2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

IEEE, 2026.

Added: July 13, 2026

Mathematical Optimization Theory and Operations Research, 25th International Conference, MOTOR 2026 Irkutsk, Russia, July 6–11, 2026 Proceedings

Switzerland: Springer, 2026.

This volume contains the refereed proceedings of the 25th International Conference on Mathematical Optimization Theory and Operations Research (MOTOR 2026) 1 held during July 6–11 in a picturesque place near Lake Baikal, Irkutsk, Russia. The MOTOR conference is a direct successor and scientific inheritor of several prominent events on mathematical programming, combinatorial and stochastic optimization, ...

Added: July 12, 2026

Задачи бесконечной регулярной реализуемости

Шиманогов И. Н., Vyalyi M., Дискретный анализ и исследование операций 2025 Т. 32 № 4(166) С. 213–230

A well-studied class of algorithmic problems is that of regular realizability: checking the non-emptiness of the intersection of a regular language with a given language. This problem has a natural algebraic interpretation: verifying whether an element of a Boolean algebra belongs to the kernel of a certain homomorphism. This motivates the consideration of an analogous ...

Added: July 12, 2026

Improving Differential Equation Solving in Compact Language Models via Activation Steering and Reinforcement Learning

Surkov A., Ignatenko V., Koltcov Sergei, Computers, Materials and Continua 2026

Large language models have recently demonstrated promising capabilities in mathematical reasoning; however, their performance on tasks requiring strict symbolic manipulation, such as solving differential equations, remains limited, especially for compact models. In this work, we investigate whether activation steering combined with reinforcement learning can improve the quality of solutions generated by pretrained language models without ...

Added: July 8, 2026

Computational Science and Its Applications – ICCSA 2026 Workshops

Springer, 2027.

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education. LNCS enjoys close cooperation with the computer science R & ...

Added: July 8, 2026

Conference Proceedings: 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 14-15 May 2026

IEEE, 2026.

The purpose of the 2026 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT) is to bring together researchers and practitioners from multiple areas of radio science, including biomedical engineering, radioelectronics, microelectronics, information technology, smart energy, information security and others. ...

Added: July 8, 2026

Моделирование специализированных алгоритмов маршрутизации в сетях на кристалле, представленных сериями семейств циркулянтных топологий

Маликов М. А., Монахова Э. А., Rzaev E. et al., Ученые записки Казанского университета. Серия: Физико-математические науки 2026 Т. 168 № 2 С. 269–286

This article examines series of families of two-dimensional circulant networks with rectangular L -shapes, optimal in diameter, as network-on-chip topologies with a minimal number of crossings between the links and a bounded length of the maximum link that does not depend on the network size. New network-on-chip routing algorithms, which use the coordinates of three adjacent zeros in the ...

Added: July 8, 2026

Algorithmic overlaps as thermodynamic variables: From local to cluster Monte Carlo dynamics in critical phenomena

Pilé I., Deng Y., Shchur L., Physical Review B: Condensed Matter and Materials Physics 2026 Vol. 114 No. 1 Article 014101

We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Swendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for models in different universality classes: Ising model, Potts model with three components, and four-state Potts model. The overlap of two ...

Added: July 6, 2026

Образ старшего поколения в российском цифровом дискурсе о семье

Соколова Е. Н., Grigoreva M., Знак: проблемное поле медиаобразования 2026 № 1(59) С. 92–101

The article analyzes representations of grandmothers’ and grandfathers’ images in the digital family discourse of the Russian social media segment. Based on a corpus of more than two million public posts from September 2023 to September 2024 collected via Brand Analytics, we extracted a subcorpus of 82 138 posts mentioning the older generation. The study ...

Added: June 30, 2026