Аддитивная регуляризация тематических моделей

К. В. Воронцов

Publications

?

Аддитивная регуляризация тематических моделей

С. 88.

Vorontsov K. V.

Language: Russian

Keywords: тематические модели аддитивная регуляризация

In book

Математические методы распознавания образов: 16-я Всероссийская конференция, г.Казань, 6–12 сентября 2013 г.: Тезисы докладов.

М.: Торус Пресс, 2013.

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Konstantin Vorontsov, Anna Potapenko, , in: Communications in Computer and Information ScienceVol. 436: Analysis of Images, Social Networks and Texts. Third International Conference, AIST 2014 Yekaterinburg, Russia, April 10–12, 2014 Revised Selected Papers. Cham: Springer, 2014. P. 29–46.

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models. ...

Added: December 5, 2014

Additive Regularization for Topic Models of Text Collections

Vorontsov K. V., Доклады Академии наук 2014 Vol. 89 No. 3 P. 301–304

ARTM advantages: ARTM is much simpler that Bayesian Inference ARTM focuses on formalizing task-specific requirements ARTM simplifies the multi-objective PTMs learning ARTM reduces barriers to entry into PTMs research field ARTM encourages the development of regularization library ARTM restrictions: Choosing a regularization path is a new open issue for PTMs ...

Added: December 5, 2014

Additive Regularization for Hierarchical Multimodal Topic Modeling

N. A. Chirkova, K. V. Vorontsov, Journal of machine learning and data analysis 2016 Vol. 2 No. 2 P. 187–200

Probabilistic topic models uncover the latent semantics of text collections and represent each document by a multinomial distribution over topics. Hierarchical models divide topics into subtopics recursively, thus simplifying information retrieval, browsing and understanding of large multidisciplinary collections. The most of existing approaches to hierarchy learning rely on Bayesian inference. This makes difficult the incorporation ...

Added: October 19, 2017

Topic Models Can Improve Domain Term Extraction

Elena Bolshakova, Natalia Loukachevitch, Nokel M., , in: Proc. 35th European Conference on Information Retrieval (ECIR 2013): Advances in Information RetrievalVol. 7814. Springer, 2013. P. 684–687.

Abstract. The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study. ...

Added: October 1, 2014

Регуляризация вероятностной тематической модели для выделения ядер тем

Potapenko A., В кн.: Ломоносов-2014: Материалы XXI Международной научной конференции студентов, аспирантов и молодых ученых: секция «Вычислительная математика и кибернетика». М.: Издательский отдел факультета ВМК МГУ им. М.В. Ломоносова, 2014. С. 80–82.

Вероятностное тематическое моделирование — это современный инструмент статистического анализа текстов, предназначенный для выявления тематики коллекций документов. Задача построения тематической модели имеет бесконечно много решений, что приводит к неустойчивости и плохой интерпретируемости тем. Для решения этих проблем применяется подход аддитивной регуляризации тематических моделей (ARTM). Интерпретируемость тем формализуется с помощью понятия "ядра", и вводятся регуляризаторы, способствующие их ...

Added: December 23, 2014

Breeds of cooccurrence: an attempt at classification

Roytberg M.A., Roytberg A.M., Khachko D. V., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.Т. 1: Основная программа конференции. Вып. 12 (19). М.: РГГУ, 2013. P. 568–578.

The paper proposes a substantial classification of collocates (pairs of words that tend to cooccur) along with heuristics that can help to attibute a word pair to a proper type automatically. The best studied type is frequent phrases, which includes idioms, lexicographic collocations, and syntactic selection. Pairs of this type are known to occur at a ...

Added: May 6, 2014

Тематические модели: учет сходства между униграммами и биграммами

М.А. Нокель, В кн.: Selected Papers of XVI All-Russian Scientific Conference "Digital libraries: Advanced Methods and Technologies, Digital Collections"Т. 1297. Дубна: CEUR Workshop Proceedings, 2014. С. 243–252.

В статье представлены результаты экспериментов по добавлению сходства между униграммами и биграммами в тематические модели. Вначале изучается возможность применения ассоциативных мер для выбора последующего включения биграмм в тематические модели. Затем предлагается модификация оригинального алгоритма PLSA, учитывающая похожие униграммы и биграммы, начинающиеся с одних и тех же букв. И в конце статьи предлагается новый итеративный алгоритм ...

Added: December 18, 2014

Использование тематических моделей в извлечении однословных терминов

Нокель М.А., Лукашевич Н.В., В кн.: Selected Papers of the 15th All-Russian Scientific Conference "Digital Libraries: Advanced Methods and Technologies, Digital Collections", Yaroslavl, Russia, October 14-17, 2013Vol. 1108. CEUR Workshop Proceedings, 2013. С. 52–60.

В статье представлены результаты экспериментов по применению тематических моделей к задаче извлечения однословных терминов. В качестве текстовых коллекций была взята подборка статей из электронных банковских журналов на русском языке и англоязычная часть корпуса параллельных текстов Europal. Эксперименты показывают, что использование тематической информации значительно улучшает качество извлечения однословных терминов независимо от предметной области и используемого языка. ...

Added: October 1, 2014

Использование лингвистической информации в тематической модели PLSA

Nokel M., В кн.: Сборник материалов XXI международной конференции студентов, аспирантов и молодых ученых "Ломоносов-2014". М.: Издательство МГУ, 2014. С. 120–121.

В данной работе предложен метод предобработки коллекции текстов на русском языке, улучшающий качество работы тематических моделей ...

Added: October 1, 2014

Метод учёта структуры биграмм в тематических моделях

Nokel M., Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии 2014 № 4 С. 89–97

The paper presents the results of experimental study of integrating word similarity and bigram collocations into topic models. First of all, we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. Then we propose a modification of the original algorithm PLSA, which takes into account similar unigrams and ...

Added: March 15, 2016

Тематические модели в задаче извлечения однословных терминов

М.А. Нокель, Н.В. Лукашевич, Программная инженерия 2014 № 3 С. 34–40

The paper describes the results of an experimental study of statistical topic models applied to the task of automatic single-word term extraction. The English part of the Europarl parallel corpus from the socio-political domain and the Russian articles taken from online banking magazines were used as target text collections. The experiments demonstrate that topic information ...

Added: October 1, 2014

Тематические модели: добавление биграмм и учет сходства между униграммами и биграммами

Nokel M., Loukachevitch N. V., Вычислительные методы и программирование 2015 Т. 16 № 2 С. 215–234

The results of experimental study of adding bigrams and taking account of the similarity between them and unigrams are discussed. A novel PLSA-SIM algorithm based on a modification of the original PLSA (Probabilistic Latent Semantic Analysis) algorithm is proposed. The proposed algorithm incorporates bigrams and takes into account the similarity between them and unigram components. ...

Added: March 15, 2016