Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics

Sergei Koltcov; A. Surkov; Filippov V.; V. Ignatenko

doi:10.7717/peerj-cs.1758

Publications

?

Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics

PeerJ Computer Science. 2024. Vol. 10. P. 41.

Sergei Koltcov, Surkov A., Filippov V., Ignatenko V.

Topic modeling is a widely used instrument for the analysis of large text collections.
In the last few years, neural topic models and models with word embeddings have
been proposed to increase the quality of topic solutions. However, these models
were not extensively tested in terms of stability and interpretability. Moreover, the
question of selecting the number of topics (a model parameter) remains a challenging
task. We aim to partially fill this gap by testing four well-known and available to
a wide range of users topic models such as the embedded topic model (ETM),
Gaussian Softmax distribution model (GSM), Wasserstein autoencoders with Dirichlet
prior (W-LDA), and Wasserstein autoencoders with Gaussian Mixture prior (WTMGMM).
We demonstrate that W-LDA, WTM-GMM, and GSM possess poor stability
that complicates their application in practice. ETM model with additionally trained
embeddings demonstrates high coherence and rather good stability for large datasets,
but the question of the number of topics remains unsolved for this model. We also
propose a new topic model based on granulated sampling with word embeddings
(GLDAW), demonstrating the highest stability and good coherence compared to
other considered models. Moreover, the optimal number of topics in a dataset can
be determined for this model.

Research target: Computer Science

Language: English

Full text

DOI

Keywords: coherence topic modeling Renyi entropy Stability optimal number of topics word embeddings neural topic models

Publication based on the results of:

Innovative methods of data collection and analysis in the modeling of communicative behavior of Internet users and the development of respective technological solutions (2023)

ML-based Fast Simulation of FARICH Responses

Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...

Added: May 19, 2026

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Association for Computational Linguistics, 2026.

Added: May 19, 2026

Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Кондратьев С., Никитин Г. Э., Дырченкова Ю. А. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

Added: May 19, 2026

Parallel Computational Technologies. PCT 2025

Springer, 2025.

This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...

Added: May 18, 2026

KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security

Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15

To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...

Added: May 16, 2026

DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data

Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66

Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...

Added: May 16, 2026

QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks

Xiong N., Long W., He D. et al., Algorithms 2026 Vol. 19 No. 5 Article 386

In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...

Added: May 13, 2026

Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.

The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...

Added: May 12, 2026

Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)

Springer, 2026.

Added: May 12, 2026

Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами

Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110

Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...

Added: May 12, 2026

Connected and Automated Vehicle Scenario Manager Graphical User Interface

Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547

High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...

Added: May 11, 2026

Proceedings 2026 IEEE 11th International Conference on Smart Cloud SmartCloud 2026 8-10 May 2026

Los Alamitos: IEEE Computer Society, 2026.

It is a great pleasure for us to welcome you on behalf of the conference committees, to the 11th IEEE International Conference on Smart Cloud (IEEE SmartCloud 2026), we are glad that we can have this international conference in New York city, USA. Now, please allow us to introduce the IEEE SmartCloud 2026 conference. The ...

Added: May 10, 2026

От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)

Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194

With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...

Added: May 8, 2026

Explainable AI for Industry 5.0: Shedding light on the black box

Avdoshin S. M., Pesotskaya E. Y., Business Informatics 2026 Vol. 20 No. 1 P. 7–28

The rapid development of artificial intelligence (AI) is accompanied by increasing computational complexity and decreasing model transparency, which significantly limits its adoption in critical domains that require a high level of trust, interpretability, and justification of decisions. Under these conditions, the field of Explainable Artificial Intelligence (XAI) has gained particular importance as it focuses on approaches and technologies that ...

Added: May 8, 2026

Comparative Analysis of Students’ Perceptions of Programming Puzzles: Parson’s and Wordle-Like

Varnavsky A., IEEE Access 2026 Vol. 14 P. 37487–37508

Puzzles are an excellent tool for learning computer science and programming, fostering increased interest, engagement, and motivation among students, as well as developing logical, critical, and computational thinking. Among beginner programmers, Parson's Programming Puzzles are quite popular, aimed at mastering the basic syntactic and logical constructs of programming languages. However, as students' skills grow, their ...

Added: May 7, 2026

Towards performance analysis of GPU-aware MPI over Angara interconnect

Ismagilov T., Mukosey A., Smirnov F. et al., International Journal of High Performance Computing Applications 2026 Vol. 40 No. 2 P. 240–253

One of the most important aspects of supercomputer development in the post-Moore era is the interconnect technologies that allow one to unite a multitude of processing elements into a well-synchronized computing system. Novel types of supercomputer interconnect require careful benchmarking and compliance with the requirements of modern hardware trends. GPU-based heterogeneous computing is one of ...

Added: May 7, 2026

Программные инструментальные средства для разработки мероприятий по снижению брака серийного производства

Yasnitsky L., Голдобин М. А., Мезенцев А. С., Прикладная математика и вопросы управления 2025 № 2 С. 99–116

Представлен обзор современных методов и основанных на них программных инструментах, применяемых для математического моделирования серийных производственных процессов с целью снижения брака и повышения качества производимых изделий. Перечисляются группы работ, нацеленных на обнаружение и классификацию дефектов, работ, в которых решаются задачи прогнозирования образования дефектов и определения значимости параметров, работ направленных на поиск оптимального сочетания технологических параметров изготовления изделий, ...

Added: May 5, 2026

Моделирование и оценка ресурсных затрат алгоритмов маршрутизации в сетях на кристалле с двумерной циркулянтной топологией

Монахова Э. А., Монахов О. Г., Rzaev E. et al., Прикладная дискретная математика 2026 Т. 71 С. 112–127

В настоящей работе исследовано совместное конструирование топологий семейств оптимальных по диаметру циркулянтных сетей $C(N; \pm 1, \pm s_2)$ и реализуемых для них оптимальных алгоритмов маршрутизации сложности $O(1)$. Предлагаемый алгоритм маршрутизации основан на использовании масштабируемых параметров $L$-образных шаблонов плотной укладки графов на плоскости для семейств оптимальных сетей. Определены аналитические формулы зависимости этих параметров от диаметра графов семейств ...

Added: May 4, 2026

Interpretable Machine Learning in Guided Synthesis of Stable Sols Based on Nanosized Titanium Oxides

Glushko A., Neznanov A., Kuz'micheva G. et al., , in: 2026 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), 5-7 Feb. 2026.: IEEE, 2026. P. 1–6.

This report discusses the guided synthesis of sols containing nanosized titanium(IV) oxides for use in biological and medical applications. These sols vary in size (from ∼2 up to 2000nm) and different stability (from 0 up to 90 days). They are synthesized under changing fabrication conditions (temperature, hydrolysis duration, titanium-containing precursors composition and concentration) without surfactants. ...

Added: April 29, 2026

Optimizing Modality Weights in Topic Models of Transactional Data

Khrylchenko K., Vorontsov K. V., Automation and Remote Control 2022 Vol. 83 No. 12 P. 1908–1922

Added: November 19, 2025

Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling

Shumen: INCOMA Ltd, 2025.

This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work ...

Added: October 20, 2025

Digital Humanities and Literary Realism

Skorinkin D., Orekhov B., , in: The Oxford Handbook of Global Realisms.: Oxford: Oxford University Press, 2025. Ch. 10 P. 177–204.

This chapter investigates literary prose of the realist era in Russia using digital humanities methods. It focuses on how computational analysis can enhance an understanding of descriptions of literary characters, geographical locations, and lexical composition in literary texts. Using a corpus of more than five hundred texts (forty-six million word occurrences), it eschews the focus ...

Added: September 14, 2025