Модификации EM-алгоритма для вероятностного тематического моделирования

К.В. Воронцов; Потапенко А. А.

?

Модификации EM-алгоритма для вероятностного тематического моделирования

Машинное обучение и анализ данных. 2013. Т. 1. № 6. С. 657–686.

К.В. Воронцов, Потапенко А. А.

Probabilistic topic models discover a low-dimensional interpretable representation of text corpora
by estimating a multinomial distribution over topics for each document and a multinomial
distribution over terms for each topic. A unied family of expectation-maximization (EM) like
algorithms with smoothing, sampling, sparsing, and robustness heuristics that can be used in
any combinations is considered. The known models PLSA (probabilistic latent semantic analysis),
LDA (latent Dirichlet allocation), SWB (special words with background), as well as new
ones can be considered as special cases of the presented broad family of models. A new simple robust
algorithm suitable for sparse models that do not require to estimate and store a big matrix
of noise parameters is proposed. The present authors nd experimentally optimal combinations
of heuristics with sparsing strategies and discover that sparse robust model without Dirichlet
smoothing performs very well and gives more than 99% of zeros in multinomial distributions
without loss of perplexity.

Research target: Computer Science Mathematics

Priority areas: IT and mathematics mathematics

Language: Russian

Keywords: EM-алгоритм EM-algorithm latent Dirichlet allocation латентное размещение Дирихле probabilistic topic model bayesian inference probabilistic latent semantic analysis вероятностная тематическая модель байесовский вывод вероятностный латентный семантический анализ

Модификации EM-алгоритма для вероятностного тематического моделирования

Vorontsov K. V., Potapenko A., Машинное обучение и анализ данных 2013 Т. 1 № 6 С. 657–686

Added: February 19, 2015

Аддитивная регуляризация тематических моделей

Vorontsov K. V., Потапенко А. А., Доклады Академии наук 2014 Т. 456 № 3 С. 268–271

Вероятностное тематическое моделирование коллекций текстовых документов развивается в настоящее время, главным образом, в рамках байесовского подхода и графических моделей. В данной работе предлагается альтернативный подход, свободный от избыточных вероятностных предположений. Аддитивная регуляри зация тематических моделей (ARTM) основана на максимизации взвешенной сум мы логарифма правдоподобия и дополнительных критериев регуляризаторов. Это упрощает комбинирование тематических моделей и построение сколь угод но сложных многоцелевых моделей. ...

Added: December 5, 2014

Регуляризация, робастность и разреженность вероятностных тематических моделей

Vorontsov K. V., Potapenko A., Компьютерные исследования и моделирование 2012 Т. 4 № 4 С. 693–706

We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Well- known models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model ...

Added: February 19, 2015

On complexity of multi-valued logic functions over one infinite basis

Kochergin V.V., Mikhailovich A.V., Journal of Applied and Industrial Mathematics (перевод журналов "Сибирский журнал индустриальной математики" и "Дискретный анализ и исследование операций") 2018 Vol. 12 No. 1 P. 40–58

The complexity of realization of k-valued logic functions by circuits in a special infinite basis is under study. This basis consists of Post negation (i.e. function x+1(mod k)) and all monotone functions. The complexity of the circuit is the total number of elements of this circuit. For an arbitrary function f, we find the lower and upper bounds ...

Added: March 11, 2018

Использование метода главных компонент для анализа надежности цепей поставок

Kuznetsov V. O., Логистика и управление цепями поставок 2018 № 4 (87) С. 27–33

One of the options for a more flexible approach to analyzing the reliability of supply chains is the principal component analysis (PCA). With a large number of variables describing supply chain, it is a difficult task to analyze the structure of variables in two-dimensional space. Within the analysis of the variables dependencies PCA allows to ...

Added: November 29, 2018

Труды Международного симпозиума «НАДЕЖНОСТЬ И КАЧЕСТВО»: в 2 т.

Пенза: ПГУ, 2015.

В сборник трудов включены доклады юбилейного ХХ-го Международного симпозиума «Надежность и качество», проходившего с 25 по 31 мая 2015 г. в городе Пензе. Рассмотрены актуальные проблемы теории и практики повышения надежности и качества; эффективности внедрения инновационных и информационных технологий в фундаментальных научных и прикладных исследованиях, образовательных и коммуникативных системах и средах, экономике и юриспруденции; методов и ...

Added: May 31, 2015

The computational complexity of three graph problems for instances with bounded minors of constraint matrices

Gribanov D., Malyshev D., Discrete Applied Mathematics 2017 Vol. 227 P. 13–20

We consider boolean linear programming formulations of the independent set, the vertex and the edge dominating set problems and prove their polynomial-time solvability for classes of graphs with (augmented) constraint matrices having bounded minors in the absolute value ...

Added: April 23, 2017

Полиномиальная разрешимость задачи о независимом множестве для одного класса графов малого диаметра

Malyshev D., Дискретный анализ и исследование операций 2012 Т. 19 № 4 С. 66–72

Рассматривается конструктивный подход к формированию новых случаев эффективной разрешимости задачи о независимом множестве в семействе наследственных частей множества графов Free({P5,C5}). Именно, доказывается, что если эта задача полиномиально разрешима в классе Free({P5,C5,G}), то для любого графа H, который может быть индуктивно получен из G применением к текущему графу сложения с K1 или умножения на K1, эта ...

Added: August 31, 2012

Применение нейросетевого индикатора тренда в анализе стоимости нефтяных фьючерсов 2014 г.

Kryuchkov M., Rusakov S. V., Вестник Ижевского государственного технического университета 2015 № 2(66) С. 110–112

This paper describes the results of testing the neuronal technical trend indicator according to the exchange rate of Brent oil in 2014. Testing of the model was carried out on three time series, which characterized by their features. ...

Added: August 31, 2015

Полная сложностная дихотомия для запрещенных подграфов с 7 ребрами в задаче о хроматическом индексе

Malyshev D., Дискретный анализ и исследование операций 2020 Т. 27 № 4 С. 104–130

Задача о рёберной раскраске для заданного графа состоит в том, чтобы минимизировать количество цветов, достаточное для окрашивания его рёбер так, чтобы соседние рёбра были окрашены в разные цвета. Для всех классов графов, определяемых запрещением подграфов с не более чем 6 рёбрами каждый, известен сложностной статус этой задачи. В настоящей работе данный результат улучшается и получена полная ...

Added: December 25, 2020

Logic in Central and Eastern Europe: History, Science, and Discourse

Lanham: University Press of America, 2012.

The history of logic and analytic philosophy in Central and Eastern Europe is still known to very few people. As an exception to the rule, only two scientific schools became internationally popular: the Vienna Circle and the Lvov-Warsaw School. Nevertheless, the countries included in this region have not only joint history, but also joint cultural ...

Added: February 13, 2013

The coloring problem for classes with two small obstructions

Malyshev D., / Series math "arxiv.org". 2013. No. 1307.0278v1.

The coloring problem is studied in the paper for graph classes deﬁned by two small forbidden induced subgraphs. We prove some suﬃcient conditions for eﬀective solvability of the problem in such classes. As their corollary we determine the computational complexity for all sets of two connected forbidden induced subgraphs with at most ﬁve vertices except ...

Added: October 3, 2013

The computational complexity of dominating set problems for instances with bounded minors of constraint matrices

Malyshev D., Gribanov D., Discrete Optimization 2018 Vol. 29 P. 103–110

We consider boolean linear programming formulations of the vertex and edge dominating set problems and prove their polynomial-time solvability for classes of graphs with constraint matrices having bounded minors in the absolute value. ...

Added: April 8, 2018

Об одномерных проекциях многогранников задач дискретной оптимизации

Vyalyi M., Дискретная математика 1991 Т. 3 № 3 С. 35–45

Added: October 17, 2014

Способ редукции графов и его приложения

Sirotkin D., Malyshev D., Дискретная математика 2017 Т. 29 № 3 С. 114–125

Задача о независимом множестве для заданного обыкновенного графа состоит в вычислении размера наибольшего множества его попарно несмежных вершин. Предлагается новый способ редукции графов. С его помощью получено новое доказательство NP-полноты задачи о независимом множестве в классе планарных графов и доказана NP-полнота данной задачи в классе плоских графов, имеющих только треугольные внутренние грани, с максимальной степенью ...

Added: September 7, 2017

О некоторых медленно сходящихся системах преобразований термов

Beklemishev L. D., Оноприенко А. А., Математический сборник 2015 Т. 206 № 9 С. 3–20

We formulate some term rewriting systems in which the number of computation steps is finite for each output, but this number cannot be bounded by a provably total computable function in Peano arithmetic PA. Thus, the termination of such systems is unprovable in PA. These systems are derived from an independent combinatorial result known as the Worm ...

Added: March 13, 2016

Пятая Международная конференция «Системный анализ и информационные технологии» САИТ-2013 (19–25 сентября 2013 г., г.Красноярск, Россия): Труды конференции. В 2-х т.

Красноярск: ИВМ СО РАН, 2013.

Труды Пятой Международной конференции «Системный анализ и информационные технологии» САИТ-2013 (19–25 сентября 2013 г., г.Красноярск, Россия): ...

Added: November 18, 2013

Криптографические методы защиты информации. Учебно-методическое пособие

Babash A. V., М.: ИНФРА-М, РИОР, 2013.

Пособие предназначено для студентов высших учебных заведений, обучающихся по специальности «Прикладная информатика (в экономике)». Оно также содержит методический материал для ряда инновационных курсов лекций по профилю «Информационная безопасность» и может быть использовано и для блока дисциплин этого профиля. Ряд представленных результатов полезен специалистам и аспирантам, специализирующихся в указанной области. ...

Added: January 14, 2014

The complexity of the 3-colorability problem in the absence of a pair of small forbidden induced subgraphs

Malyshev D., Discrete Mathematics 2015 Vol. 338 No. 11 P. 1860–1865

We completely determine the complexity status of the 3-colorability problem for hereditary graph classes defined by two forbidden induced subgraphs with at most five vertices. ...

Added: April 7, 2014

Элементы рандомизированного прогнозирования и его применение для предсказания суточной электрической нагрузки энергетической системы

Popkov Y., Popkov A., Dubnov Y. A., Автоматика и телемеханика 2020 № 7 С. 148–172

A randomized forecasting method based on the generation of ensembles of entropy-optimal forecasting trajectories is developed. The latter are generated by randomized dynamic regression models containing random parameters, measurement noises, and a random input. The probability density functions of random parameters and measurement noises are estimated using real data within the randomized machine learning procedure. ...

Added: October 31, 2020

Численное моделирование затвердевания сплавов при интенсивном сопряженном теплообмене

Marshirov V. V., Marshirova L. E., Сибирский журнал индустриальной математики 2013 Т. XVI № 4 С. 111–120

The paper considers the problem of determining the rate of cooling of metal during solidification at the intersection of the liquidus temperature under intense heat sink from the surface. The solution to this problem it is necessary to determine the process conditions, the boundary and initial conditions for which it is possible to get new ...

Added: November 17, 2013

Agent-based modelling of interactions between air pollutants and greenery using a case study of Yerevan, Armenia

Akopov A. S., Beklaryan L. A., Saghatelyan A. K., Environmental Modelling and Software 2019 Vol. 116 P. 7–25

Urban greenery such as trees can effectively reduce air pollution in a natural and eco-friendly way. However, how to spatially locate and arrange greenery in an optimal way remains as a challenging task. We developed an agent-based model of air pollution dynamics to support the optimal allocation and configuration of tree clusters in a city. The Pareto ...

Added: February 24, 2019

Complete complexity dichotomy for 7-edge forbidden subgraphs in the edge coloring problem

Malyshev D., Journal of Applied and Industrial Mathematics (перевод журналов "Сибирский журнал индустриальной математики" и "Дискретный анализ и исследование операций") 2020 Vol. 14 No. 4 P. 706–721

The edge coloring problem for a graph is to minimize the number of colors that are sufficient to color all edges of the graph so that all adjacent edges receive distinct colors. The computational complexity of the problem is known for all graph classes defined by forbidden subgraphs with at most 6 edges. We improve ...

Added: January 30, 2021

Proceedings of 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT'17) / April 5-7, 2017

Barcelona: IEEE, 2017.

International Conference on Control, Decision and Information Technologies. ...

Added: January 17, 2018