Модификации EM-алгоритма для вероятностного тематического моделирования

К.В. Воронцов; Потапенко А. А.

?

Модификации EM-алгоритма для вероятностного тематического моделирования

Машинное обучение и анализ данных. 2013. Т. 1. № 6. С. 657–686.

К.В. Воронцов, Потапенко А. А.

Probabilistic topic models discover a low-dimensional interpretable representation of text corpora
by estimating a multinomial distribution over topics for each document and a multinomial
distribution over terms for each topic. A unied family of expectation-maximization (EM) like
algorithms with smoothing, sampling, sparsing, and robustness heuristics that can be used in
any combinations is considered. The known models PLSA (probabilistic latent semantic analysis),
LDA (latent Dirichlet allocation), SWB (special words with background), as well as new
ones can be considered as special cases of the presented broad family of models. A new simple robust
algorithm suitable for sparse models that do not require to estimate and store a big matrix
of noise parameters is proposed. The present authors nd experimentally optimal combinations
of heuristics with sparsing strategies and discover that sparse robust model without Dirichlet
smoothing performs very well and gives more than 99% of zeros in multinomial distributions
without loss of perplexity.

Research target: Computer Science Mathematics

Priority areas: IT and mathematics mathematics

Language: Russian

Keywords: EM-алгоритм EM-algorithm latent Dirichlet allocation латентное размещение Дирихле probabilistic topic model bayesian inference probabilistic latent semantic analysis вероятностная тематическая модель байесовский вывод вероятностный латентный семантический анализ

Pseudo-Boolean Polynomial Method for InterpreTab. Dimensionality Reduction: A Paradigm Shift from Abstract to Meaningful Feature Extraction

Chikake T. M., Goldengorin B. I., Pardalos P. M., Computer Optics 2025 Vol. 49 No. 6 P. 1191–1201

We present a general-purpose, training-free framework for dimensionality reduction and clustering based on per–sample pseudo–Boolean polynomials (PBP). The method constructs compact, interpreTab. features without model fitting and is evaluated under a standardized protocol that compares PBP to PCA, t-SNE, and UMAP using identical inputs and metrics: clustering alignment (V-measure, Adjusted Rand Index), cluster geometry (Silhouette coefficient, ...

Added: January 2, 2026

ИТ-кризисология: методология для устойчивого развития сложных социотехнических систем

Zykov S. V., Информационно-измерительные и управляющие системы 2025 № 5 С. 110–118

В настоящее время растет важность задач, связанных с анализом моделей и методов, применимых для поддержки разработки сложных социотехнических систем в условиях кризиса. В фокусе исследования находится антикризисное управление разработкой таких систем с учетом обеспечения устойчивости процессов их разработки и последующего развития на основе комплексного учета структурно-поведенческих особенностей их построения. При этом для обеспечения возможности антикризисного ...

Added: December 30, 2025

Повышение эффективности потоковой обработки данных в интеллектуальной образовательной системе

Zykov S. V., Ермаков С. Р., Информационно-измерительные и управляющие системы 2025 № 5 С. 41–54

В условиях развития интеллектуальных обучающих и образовательных систем в них возникает необходимость обработки потоковых данных с соблюдением строгих ограничений по качеству обслуживания, таких как потребление памяти, точность прогнозирования, время отклика в условиях динамичного изменения данных и других. Проблематика заключается в том, что традиционные методы обработки данных зачастую не позволяют обеспечить требуемое качество при ограниченных ресурсах. Цель. Разработать ...

Added: December 30, 2025

29th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES 2025)

Elsevier, 2025.

-- ...

Added: December 30, 2025

IT Crisisology Patterns and Practices: Smart Agility for Digital Future

Zykov S. V., Springer, 2025.

This book focusses on real-world practitioner’s guidance in crisis management of digital product development. This includes monitoring, predicting, preventing and agile responding to critical situations by systematically applying resilient patterns and practices. This book introduces a thoroughly integrated toolbox of patterns and practices for sustainable crisis management, each individual component of which was carefully selected ...

Added: December 30, 2025

Community detection on simplicial complexes

Ермолаев Е. С., Applied Network Science 2025 Vol. 10 Article 30

Recent advances in complex systems have highlighted the utility of simplicial complexes for modeling higher-order interactions, particularly in biological and physical networks. This study presents enhanced Simplex2Vec, an adaptation of the Simplex2Vec algorithm, to facilitate community detection within such structures. We compare enhanced Simplex2Vec’s efficacy against the Leiden algorithm and Spectral clustering using 7 distinct ...

Added: December 30, 2025

Parallel Processing and Applied Mathematics. 15th International Conference, PPAM 2024, Ostrava, Czech Republic, September 8–11, 2024, Revised Selected Papers, Part I

Springer, 2025.

This book constitutes the refereed proceedings of the 15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024, held in Ostrava, Czech Republic, during September 8–11, 2024. The 75 full papers included in this book were carefully reviewed and selected from 134 submissions. The papers are organized in the following topical sections: Part I : Numerical ...

Added: December 26, 2025

Left Bousfield localization without left properness

Batanin M., White D., Journal of Pure and Applied Algebra 2024 Vol. 228 No. 6 P. 1–31

Given a combinatorial (semi-)model category M and a set of morphisms C, we establish the existence of a semi-model category LCM satisfying the universal property of the left Bousfield localization in the category of semi-model categories. Our main tool is a semi-model categorical version of a result of Jeff Smith, that appears to be of ...

Added: December 26, 2025

Ricci-flat metrics on vector bundles over flag manifolds

Dmitri Bykov, Achmed-Zade I., Communications in Mathematical Physics 2020 Vol. 376 No. 3 P. 2309–2328

Abstract. We construct explicit complete Ricci-flat metrics on the total spaces of certain vector bundles over flag manifolds of the group SU (n), for all K¨ahler classes. These metrics are natural generalizations of the metrics of Candelasde la Ossa on the conifold, Pando Zayas-Tseytlin on the canonical bundle over CP1 × CP1, as well as the metrics on ...

Added: December 26, 2025

Generating and Debugging Java Code using LLMs based on Associative Recurrent Memory

Василевский В. И., Alexandrov D., Proceedings of the Institute for System Programming of the RAS 2025 Vol. 37 No. 5 P. 173–182

Automatic code generation by large language models (LLMs) has achieved significant success, yet it still faces challenges when dealing with complex and large codebases, especially in languages like Java. The limitations of LLM context windows and the complexity of debugging generated code are key obstacles. This paper presents an approach aimed at improving Java code generation and debugging. ...

Added: December 26, 2025

Разработка и интеграция AI-ассистента в систему управления обучением.

Караваева Е. А., Василевский В. И., Ланин Г. М. et al., Труды Института системного программирования РАН 2025 Т. 37 № 4 С. 175–190

The ongoing digitalization of education requires new ways of presenting information and attention retention mechanisms. The aim of the presented work is to propose a solution for implementing a large language model, which will interactively generate prompts of different types, within an e-learning course on programming. The main approaches are the analysis of existing relatively ...

Added: December 25, 2025

On finding formal power-logarithmic expansions of solutions to q-difference equations

Gaianov N., Parusnikova A., / Cornell University. Серия math "arxiv.org". 2025.

An algebraic q-difference equation is considered. A sufficient condition for the existence of a formal power-logarithmic expansion of a solution to such an equation in the neighborhood of zero is proposed. An example of applying this sufficient condition for constructing a formal expansion of a solution to a certain q-difference analogue of the fifth Painlevé equation ...

Added: December 25, 2025

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Цифровая экономика: 2026 : краткий статистический сборник

Abashkin V., Abdrakhmanova G., Vishnevskiy K. et al., М.: ИСИЭЗ ВШЭ, 2026.

This pocket data book contains the main indicators reflecting the relevance of digital technologies for enterprises and individuals, the activity of ICT sector enterprises, the infrastructure and personnel of the digital economy. The data book includes information of the Russian Federal State Statistics Service (Rosstat), Ministry of Digital Development, Communications and Mass Media of the Russian ...

Added: December 23, 2025

Loop homology of moment-angle complexes in the flag case

Fedor Vylegzhanin, Algebraic and Geometric Topology 2025 Vol. 25 No. 9 P. 5619–5663

We develop a general homological approach to presentations of connected graded associative algebras, and apply it to the loop homology of moment-angle complexes Z_K that correspond to flag simplicial complexes K. For an arbitrary coefficient ring, we describe generators of the Pontryagin algebra H_∗(ΩZ_K) and defining relations between them. We prove that such moment-angle complexes ...

Added: December 22, 2025

Ideal of the variety of flexes of plane cubics

Popov V., / Series arXiv "math". 2025. No. 2502.01539.

We prove that the variety of flexes of algebraic curves of degree 3 in the projective plane is an ideal theoretic complete intersection in the product of a two-dimensional and a nine-dimensional projective spaces. ...

Added: December 16, 2025

Random walks on rank one symmetric spaces of noncompact type

Gnetov F., Konakov V., / Series arXiv "math". 2025. No. 2512.04667.

We establish a central limit theorem, a local limit theorem, and a law of large numbers for a natural random walk on a symmetric space M of non-compact type and rank one. This class of spaces, which includes the complex and quaternionic hyperbolic spaces and the Cayley hyperbolic plane, generalizes the real hyperbolic space Hn. Our approach introduces ...

Added: December 5, 2025

Cascades of Lorenz attractors in the Shimizu-Morioka model

Kazakov A., Koryakin V., Safonov K. et al., / Series arXiv "math". 2025.

The Lorenz attractor is the first example of a robustly chaotic non-hyperbolic attractor. Each orbit of such an attractor has a positive top Lyapunov exponent, and this property persists under small perturbations despite possible bifurcations of the attractor. In this paper, we study the boundary of the Lorenz attractor existence region in the Shimizu-Morioka model. ...

Added: December 4, 2025

Асимптотический вариант метода параметрикс для цепей Маркова, сходящихся к диффузиям

Bitter I., Konakov V., / Cornell University. Серия arXiv "math". 2025. № 2505.24548.

В работе приводится обобщение локальной предельной теоремы о сходимости неоднородных цепей Маркова к диффузионному пределу на случай, когда соответ- ствующие коэффициенты процессов удовлетворяют слабым условиям регулярности и совпадают лишь асимптотически. В частности, рассматриваемые нами коэффици- енты сноса могут быть неограниченными с не более чем линейным ростом, а оценки отражают перенос терминального состояния неограниченным трендом через ...

Added: December 3, 2025

Stabilization of direct images for curves

Bogomolov F. A., Schrandt S., / Series arXiv "math". 2025.

We discuss phenomena of stabilization for direct images of line bundles over projective curves mapping onto the projective line, for maps of sufficiently big degree. ...

Added: December 1, 2025

Upper Bounds on the Torsion Index of Half-Spin Groups

Deviatov R., Baek S., / Series arXiv "math". 2025.

The torsion index of split simple groups has been extensively studied, notably by Totaro, who calculated the torsion indexes of the spin groups and $E_{8}$ in [5] and [6], respectively. The aim of this paper is to provide upper bounds for the torsion index of half-spin groups, the only remaining case in the calculation of ...

Added: December 1, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Birational transformations of threefold Q-conic bundles

Prokhorov Y., / Series arXiv "math". 2025.

A $\mathbf{Q}$-conic bundle is a contraction $f: X\to Z$ of a three-dimensional algebraic variety $X$ to a surface~$Z$ such that the variety~$X$ has only terminal $\mathbf{Q}$-factorial singularities, the anticanonical divisor $-K_X$ is~$f$-ample, and $\uprho(X/Z)=1$. We provide an algorithm to transform a $\mathbf{Q}$-conic bundle to its standard form. ...

Added: December 1, 2025

Birational geometry of hyperkahler manifolds and the Hu-Yau conjecture

Amerik E., Verbitsky M., Soldatenkov A., / Series arXiv "math". 2025.

Wierzba and Wisniewski proved that in dimension 4, every bimeromorphic map of hyperkahler manifolds is represented as a composition of Mukai flops. Hu and Yau conjectured that this result can be generalized to arbitrary dimension. They defined ``Mukai's elementary transformation'' as the blow-up of a subvariety ruled by complex projective spaces, composed with the contraction ...

Added: December 1, 2025