Topic modelling is an area of text mining that has been actively developed in the last 15 years. A probabilistic topic model extracts a set of hidden topics from a collection of text documents. It defines each topic by a probability distribution over words and describes each document with a probability distribution over topics. In applications, there are often many requirements, such as, for example, problem-specific knowledge and additional data, to be taken into account. Therefore, it is natural for topic modelling to be considered a multiobjective optimization problem. However, historically, Bayesian learning became the most popular approach for topic modelling. In the Bayesian paradigm, all requirements are formalized in terms of a probabilistic generative process. This approach is not always convenient due to some limitations and technical difficulties. In this work, we develop a non-Bayesian multiobjective approach called the Additive Regularization of Topic Models (ARTM). It is based on regularized Maximum Likelihood Estimation (MLE), and we show that many of the well-known Bayesian topic models can be re-formulated in a much simpler way using the regularization point of view. We review some of the most important types of topic models: multimodal, multilingual, temporal, hierarchical, graph-based, and short-text. The ARTM framework enables easy combination of different types of models to create new models with the desired properties for applications. This modular 'lego-style' technology for topic modelling is implemented in the open-source library BigARTM. © 2017 FRUCT.
Given a Lévy process (Lt)t≥0 and an independent nondecreasing process (time change) (T(t))t≥0, we consider the problem of statistical inference on T based on low-frequency observations of the time-changed Lévy process LT(t). Our approach is based on the genuine use of Mellin and Laplace transforms. We propose a consistent estimator for the density of the increments of T in a stationary regime, derive its convergence rates and prove the optimality of the rates. It turns out that the convergence rates heavily depend on the decay of the Mellin transform of T. Finally, the performance of the estimator is analysed via a Monte Carlo simulation study.