• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Article

Метод учёта структуры биграмм в тематических моделях

Нокель М. А.
The paper presents the results of experimental study of integrating word similarity and bigram collocations into topic models. First of all, we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. Then we propose a modification of the original algorithm PLSA, which takes into account similar unigrams and bigrams that start with the same beginning. And at the end we present a novel unsupervised iterative algorithm demonstrating how topics can choose the most relevant bigrams. As a target text collection we took articles from various Russian electronic banking magazines. The experiments demonstrate significant improvement of topic models quality.