Stable Topic Modeling with Local Density Regularization
Topic modeling has emerged over the last decade as a powerful tool for analyzing large text corpora, including Web-based user-generated texts. Topic stability, however, remains a concern: topic models have a very complex optimization landscape with many local maxima, and even different runs of the same model yield very different topics. Aiming to add stability to topic modeling, we propose an approach to topic modeling based on local density regularization, where words in a local context window of a given word have higher probabilities to get the same topic as that word. We compare several models with local density regularizers and show how they can improve topic stability while remaining on par with classical models in terms of quality metrics.
This volume is dedicated to the 80th anniversary of academician V. M. Matrosov. The book contains reviews and original articles, which address the issues of development of the method of vector Lyapunov functions, questions of stability and stabilization control in mechanical systems, stability in differential games, the study of systems with multirate time and other. Articles prepared specially for this edition.
Subsystem ASONIKA-T can operate in standalone mode or as part of ASONIKA in combination with other subsystems. Subsystem ASONIKA-T is designed to automate the modeling of thermal processes such as micro assemblies, radiators, heat-removing bases, hybrid-integrated modules, power cordwood structure, cabinets, racks, and atypical (arbitrary) structures electronics.
The goal of the conference is to help build cross-disciplinary networks of analysts, software specialists, and researchers to advance the use of textual information in multiple science, technology, and business development fields. Within this context, conference themes will include, but are not limited to:
DataSourcing, preparing, and interpreting data sources including patents, publications, webscraping, and other novel data sources
Text-mining tools and methodsBest practices in software-based topic modeling, clumping, association rules, term manipulation, text manipulation, etc. Visualization
Applied researchFuture-Oriented Technology Analysis (FTA) Intelligence gathering to support decision-making in the private sector (e.g., Management of Technology)
This paper examines two Markov chain Monte Carlo methods that have been widely used in econometrics. An introductory exposition of the Metropolis algorithm and the Gibbs sampler is provided. These methods are used to simulate multivariate distributions. Many problems in Bayesian statistics can be solved by simulating the posterior distribution. Invariance condition is of importance, the proofs are given for both methods. We use finite Markov chains to explore and substantiate the methods. Several examples are provided to illustrate the applicability and efficiency of the Markov chain Monte Carlo methods. They include bivariate normal distribution with high correlation, bivariate exponential distribution, mixture of bivariate normals.
An important text mining problem is to find, in a large collection of texts, documents related to specic topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to nd the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predened sets of keywords (that dene the topics researchers are interested in) are restricted to specic intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.
An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments.
In this paper we introduce a generalized learning algorithm for probabilistic topic models (PTM). Many known and new algorithms for PLSA, LDA, and SWB models can be obtained as its special cases by choosing a subset of the following “options”: regularization, sampling, update frequency, sparsing and robustness. We show that a robust topic model, which distinguishes specific, background and topic terms, doesn’t need Dirichlet regularization and provides controllably sparse solution.