Topic Models Regularization and Initialization for Regression Problems
We propose a new method of feature extraction for regression problems with text data that transforms the sparse texts to dense features using regularized topic models. We also discuss the problem of topic model initialization, and propose a new approach based on Naive Bayes. This approach is compared to many others, and it achieves a quality comparable to vector space models using as little as ten topics. It also outperforms other methods for feature generation based on topic modeling, such as PLSA and Supervised LDA.