?
Использование лингвистической информации в тематической модели PLSA
С. 120-121.
Nokel M.
In book
М. : Издательство МГУ, 2014
Nokel M., Loukachevitch N. V., Вычислительные методы и программирование 2015 Т. 16 № 2 С. 215-234
The results of experimental study of adding bigrams and taking account of the similarity between them and unigrams are discussed. A novel PLSA-SIM algorithm based on a modification of the original PLSA (Probabilistic Latent Semantic Analysis) algorithm is proposed. The proposed algorithm incorporates bigrams and takes into account the similarity between them and unigram components. ...
Added: March 15, 2016
Nokel M., Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии 2014 № 4 С. 89-97
The paper presents the results of experimental study of integrating word similarity and bigram collocations into topic models. First of all, we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. Then we propose a modification of the original algorithm PLSA, which takes into account similar unigrams and ...
Added: March 15, 2016
М.А. Нокель, В кн. : Selected Papers of XVI All-Russian Scientific Conference "Digital libraries: Advanced Methods and Technologies, Digital Collections". Т. 1297.: Дубна : CEUR Workshop Proceedings, 2014. С. 243-252.
В статье представлены результаты экспериментов по добавлению сходства между униграммами и биграммами в тематические модели. Вначале изучается возможность применения ассоциативных мер для выбора последующего включения биграмм в тематические модели. Затем предлагается модификация оригинального алгоритма PLSA, учитывающая похожие униграммы и биграммы, начинающиеся с одних и тех же букв. И в конце статьи предлагается новый итеративный алгоритм ...
Added: December 18, 2014
Elena Bolshakova, Natalia Loukachevitch, Nokel M., , in : Proc. 35th European Conference on Information Retrieval (ECIR 2013): Advances in Information Retrieval. Vol. 7814.: Springer, 2013. P. 684-687.
Abstract. The paper describes the results of an experimental study of
topic models applied to the task of single-word term extraction. The
experiments encompass several probabilistic and non-probabilistic topic
models and demonstrate that topic information improves the quality of
term extraction, as well as NMF with KL-divergence minimization is the
best among the models under study. ...
Added: October 1, 2014
Vorontsov K. V., Доклады Академии наук 2014 Vol. 89 No. 3 P. 301-304
ARTM advantages:
ARTM is much simpler that Bayesian Inference
ARTM focuses on formalizing task-specific requirements
ARTM simplifies the multi-objective PTMs learning
ARTM reduces barriers to entry into PTMs research field
ARTM encourages the development of regularization library
ARTM restrictions:
Choosing a regularization path is a new open issue for PTMs ...
Added: December 5, 2014
М.А. Нокель, Н.В. Лукашевич, Программная инженерия 2014 № 3 С. 34-40
The paper describes the results of an experimental study of statistical topic models applied to the task of automatic single-word term extraction. The English part of the Europarl parallel corpus from the socio-political domain and the Russian articles taken from online banking magazines were used as target text collections. The experiments demonstrate that topic information ...
Added: October 1, 2014
Roytberg M.A., Roytberg A.M., Khachko D. V., , in : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т. Т. 1: Основная программа конференции. Вып. 12 (19).: М. : РГГУ, 2013. P. 568-578.
The paper proposes a substantial classification of collocates (pairs of words that tend to cooccur) along with heuristics that can help to attibute a word pair to a proper type automatically.
The best studied type is frequent phrases, which includes idioms, lexicographic collocations, and syntactic selection. Pairs of this type are known to occur at a ...
Added: May 6, 2014
Нокель М.А., Лукашевич Н.В., В кн. : Selected Papers of the 15th All-Russian Scientific Conference "Digital Libraries: Advanced Methods and Technologies, Digital Collections", Yaroslavl, Russia, October 14-17, 2013. Vol. 1108.: CEUR Workshop Proceedings, 2013. С. 52-60.
В статье представлены результаты экспериментов по применению тематических моделей к задаче извлечения однословных терминов. В качестве текстовых коллекций была взята подборка статей из электронных банковских журналов на русском языке и англоязычная часть корпуса параллельных текстов Europal. Эксперименты показывают, что использование тематической информации значительно улучшает качество извлечения однословных терминов независимо от предметной области и используемого языка. ...
Added: October 1, 2014
Nokel M., Loukachevitch N. V., , in : Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015). : Linköping : Linköping University Electronic Press, 2015. P. 145-152.
The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSASIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. ...
Added: March 16, 2016
Sokolov E., Bogolubsky L., , in : Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications. : NY : ACM, 2015. P. 21-27.
We propose a new method of feature extraction for regression problems with text data that transforms the sparse texts to dense features using regularized topic models. We also discuss the problem of topic model initialization, and propose a new approach based on Naive Bayes. This approach is compared to many others, and it achieves a ...
Added: February 24, 2016
Vorontsov K. V., Potapenko A., Компьютерные исследования и моделирование 2012 Т. 4 № 4 С. 693-706
We propose a generalized probabilistic topic model of text corpora which can incorporate heuristics of Bayesian regularization, sampling, frequent parameters update, and robustness in any combinations. Well- known models PLSA, LDA, CVB0, SWB, and many others can be considered as special cases of the proposed broad family of models. We propose the robust PLSA model ...
Added: February 19, 2015
Data-Driven Approach To Patient Flow Management And Resource Utilization In Urban Medical Facilities
Elizaveta S. Prokofyeva, Svetlana V. Maltseva, Fomichev N. et al., , in : 2020 IEEE 22nd Conference on Business Informatics (CBI). : IEEE, 2020. P. 71-77.
Healthcare services are tightly connected with complex data analysis techniques to enable optimal resource allocation in medical institutions. This paper proposes a detailed analysis of incoming patient flow to local polyclinic by integrating clustering techniques, process mining and a concept of self-organizing systems. The study takes into account concepts based on models of managing social ...
Added: August 31, 2020
Vorontsov K. V., В кн. : Математические методы распознавания образов: 16-я Всероссийская конференция, г.Казань, 6–12 сентября 2013 г.: Тезисы докладов. : М. : Торус Пресс, 2013. С. 88.
Назрела необходимость разработки новых принципов построения тематических моделей, свободных от избыточных вероятностных допущений. Предлагаемая теория аддитивной регуляризации тематических моделей (АРТМ) решает эти проблемы. ...
Added: December 5, 2014
Bolshakova E. I., Loukachevitch N. V., Nokel M., , in : Proc. 35th European Conference on Information Retrieval (ECIR 2013): Advances in Information Retrieval. Vol. 7814.: Springer, 2013. P. 684-687.
The paper describes the results of an experimental study of topic models applied to the task of single-word term extraction. The experiments encompass several probabilistic and non-probabilistic topic models and demonstrate that topic information improves the quality of term extraction, as well as NMF with KL-divergence minimization is the best among the models under study. ...
Added: November 18, 2013
Vorontsov K. V., Potapenko A., Машинное обучение и анализ данных 2013 Т. 1 № 6 С. 657-686
Probabilistic topic models discover a low-dimensional interpretable representation of text corpora by estimating a multinomial distribution over topics for each document and a multinomial distribution over terms for each topic. A unied family of expectation-maximization (EM) like algorithms with smoothing, sampling, sparsing, and robustness heuristics that can be used in any combinations is considered. The ...
Added: February 19, 2015
Konstantin Vorontsov, Anna Potapenko, , in : Communications in Computer and Information Science. Vol. 436: Analysis of Images, Social Networks and Texts. Third International Conference, AIST 2014 Yekaterinburg, Russia, April 10–12, 2014 Revised Selected Papers.: Cham : Springer, 2014. P. 29-46.
Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. In this tutorial we introduce a novel non-Bayesian approach, called Additive Regularization of Topic Models. ARTM is free of redundant probabilistic assumptions and provides a simple inference for many combined and multi-objective topic models. ...
Added: December 5, 2014
NY : ACM, 2015
The main objective of the workshop is to bring together researchers who are interested in applications of topic models and improving their output. Our goal is to create a broad platform for researchers to share ideas that could improve the usability and interpretation of topic models. We hope this workshop will promote topic model applications ...
Added: February 24, 2016
Nokel M., Loukachevitch N. V., , in : NAACL HLT 2015 11th Workshop on Multiword Expressions MWE 2014. : NY : Association for Computational Linguistics, 2015. P. 1-9.
The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSA-SIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. ...
Added: March 16, 2016