Использование лингвистической информации в тематической модели PLSA

М. А. Нокель

Publications

?

Использование лингвистической информации в тематической модели PLSA

С. 120–121.

Nokel M.

Language: Russian

Text on another site

Keywords: Topic Models тематические модели PLSA лингвистическая информация PLSA

In book

Сборник материалов XXI международной конференции студентов, аспирантов и молодых ученых "Ломоносов-2014"

М.: Издательство МГУ, 2014.

Data-Driven Approach To Patient Flow Management And Resource Utilization In Urban Medical Facilities

Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Fomichev N. et al., , in: 2020 IEEE 22nd Conference on Business Informatics (CBI).: IEEE, 2020. P. 71–77.

Healthcare services are tightly connected with complex data analysis techniques to enable optimal resource allocation in medical institutions. This paper proposes a detailed analysis of incoming patient flow to local polyclinic by integrating clustering techniques, process mining and a concept of self-organizing systems. The study takes into account concepts based on models of managing social ...

Added: August 31, 2020

A Method of Accounting Bigrams in Topic Models

Nokel M., Loukachevitch N. V., , in: NAACL HLT 2015 11th Workshop on Multiword Expressions MWE 2014.: NY: Association for Computational Linguistics, 2015. P. 1–9.

The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSA-SIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. ...

Added: March 16, 2016

Topic Models: Accounting Component Structure of Bigrams

Nokel M., Loukachevitch N. V., , in: Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015).: Linköping: Linköping University Electronic Press, 2015. P. 145–152.

The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSASIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. ...

Added: March 16, 2016

Тематические модели: добавление биграмм и учет сходства между униграммами и биграммами

Nokel M., Loukachevitch N. V., Вычислительные методы и программирование 2015 Т. 16 № 2 С. 215–234

The results of experimental study of adding bigrams and taking account of the similarity between them and unigrams are discussed. A novel PLSA-SIM algorithm based on a modification of the original PLSA (Probabilistic Latent Semantic Analysis) algorithm is proposed. The proposed algorithm incorporates bigrams and takes into account the similarity between them and unigram components. ...

Added: March 15, 2016

Метод учёта структуры биграмм в тематических моделях

Nokel M., Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии 2014 № 4 С. 89–97

The paper presents the results of experimental study of integrating word similarity and bigram collocations into topic models. First of all, we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. Then we propose a modification of the original algorithm PLSA, which takes into account similar unigrams and ...

Added: March 15, 2016

Topic Models Regularization and Initialization for Regression Problems

Sokolov E., Bogolubsky L., , in: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications.: NY: ACM, 2015. P. 21–27.

We propose a new method of feature extraction for regression problems with text data that transforms the sparse texts to dense features using regularized topic models. We also discuss the problem of topic model initialization, and propose a new approach based on Naive Bayes. This approach is compared to many others, and it achieves a ...

Added: February 24, 2016

Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications

NY: ACM, 2015.

The main objective of the workshop is to bring together researchers who are interested in applications of topic models and improving their output. Our goal is to create a broad platform for researchers to share ideas that could improve the usability and interpretation of topic models. We hope this workshop will promote topic model applications ...

Added: February 24, 2016

Модификации EM-алгоритма для вероятностного тематического моделирования

Vorontsov K. V., Potapenko A., Машинное обучение и анализ данных 2013 Т. 1 № 6 С. 657–686

Probabilistic topic models discover a low-dimensional interpretable representation of text corpora by estimating a multinomial distribution over topics for each document and a multinomial distribution over terms for each topic. A unied family of expectation-maximization (EM) like algorithms with smoothing, sampling, sparsing, and robustness heuristics that can be used in any combinations is considered. The ...

Added: February 19, 2015

Регуляризация, робастность и разреженность вероятностных тематических моделей

Vorontsov K. V., Potapenko A., Компьютерные исследования и моделирование 2012 Т. 4 № 4 С. 693–706

We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Well- known models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model ...

Added: February 19, 2015

Тематические модели: учет сходства между униграммами и биграммами

М.А. Нокель, В кн.: Selected Papers of XVI All-Russian Scientific Conference "Digital libraries: Advanced Methods and Technologies, Digital Collections"Т. 1297.: Дубна: CEUR Workshop Proceedings, 2014. С. 243–252.

В статье представлены результаты экспериментов по добавлению сходства между униграммами и биграммами в тематические модели. Вначале изучается возможность применения ассоциативных мер для выбора последующего включения биграмм в тематические модели. Затем предлагается модификация оригинального алгоритма PLSA, учитывающая похожие униграммы и биграммы, начинающиеся с одних и тех же букв. И в конце статьи предлагается новый итеративный алгоритм ...

Added: December 18, 2014

Tutorial on Probabilistic Topic Modeling: Additive Regularization for Stochastic Matrix Factorization

Konstantin Vorontsov, Anna Potapenko, , in: Communications in Computer and Information ScienceVol. 436: Analysis of Images, Social Networks and Texts. Third International Conference, AIST 2014 Yekaterinburg, Russia, April 10–12, 2014 Revised Selected Papers.: Cham: Springer, 2014. P. 29–46.

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models. ...

Added: December 5, 2014

Additive Regularization for Topic Models of Text Collections

Vorontsov K. V., Доклады Академии наук 2014 Vol. 89 No. 3 P. 301–304

ARTM advantages: ARTM is much simpler that Bayesian Inference ARTM focuses on formalizing task-specific requirements ARTM simplifies the multi-objective PTMs learning ARTM reduces barriers to entry into PTMs research field ARTM encourages the development of regularization library ARTM restrictions: Choosing a regularization path is a new open issue for PTMs ...

Added: December 5, 2014

Аддитивная регуляризация тематических моделей

Vorontsov K. V., В кн.: Математические методы распознавания образов: 16-я Всероссийская конференция, г.Казань, 6–12 сентября 2013 г.: Тезисы докладов.: М.: Торус Пресс, 2013. С. 88.

Назрела необходимость разработки новых принципов построения тематических моделей, свободных от избыточных вероятностных допущений. Предлагаемая теория аддитивной регуляризации тематических моделей (АРТМ) решает эти проблемы. ...

Added: December 5, 2014

Тематические модели в задаче извлечения однословных терминов

М.А. Нокель, Н.В. Лукашевич, Программная инженерия 2014 № 3 С. 34–40

The paper describes the results of an experimental study of statistical topic models applied to the task of automatic single-word term extraction. The English part of the Europarl parallel corpus from the socio-political domain and the Russian articles taken from online banking magazines were used as target text collections. The experiments demonstrate that topic information ...

Added: October 1, 2014