Topic Models Regularization and Initialization for Regression Problems

E. Sokolov; Bogolubsky L.

doi:10.1145/2809936.2809940

Publications

?

Topic Models Regularization and Initialization for Regression Problems

P. 21–27.

Sokolov E., Bogolubsky L.

We propose a new method of feature extraction for regression problems with text data that transforms the sparse texts to dense features using regularized topic models. We also discuss the problem of topic model initialization, and propose a new approach based on Naive Bayes. This approach is compared to many others, and it achieves a quality comparable to vector space models using as little as ten topics. It also outperforms other methods for feature generation based on topic modeling, such as PLSA and Supervised LDA.

Language: English

DOI

Keywords: text mining nonlinear optimization Topic Models

In book

Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications

NY: ACM, 2015.

Book of abstracts of the IX International Conference on Optimization Methods and Applications (OPTIMA-2018), Petrovac, Montenegro, October 1-5, 2018

M.: [б.и.], 2018.

Book include abstracts of reports presented at the IX International Conference on Optimization Methods and Applications "Optimization and applications" (OPTIMA-2018) held in Petrovac, Montenegro, October 1 - October 5, 2018. ...

Added: October 9, 2018

Human-centered text mining: A new software system

Kuznetsov S., Poelman J., Elzinga P. et al., Lecture Notes in Computer Science 2012 Vol. 7377 LNAI P. 528–272

In this paper we introduce a novel human-centered data mining software system which was designed to gain intelligence from unstructured textual data. The architecture takes its roots in several case studies which were a collaboration between the Amsterdam-Amstelland Police, GasthuisZusters Antwerpen (GZA) hospitals and KU Leuven. It is currently being implemented by bachelor and master ...

Added: February 7, 2013

Global technology trends monitoring: theoretical frameworks and best practices

Sokolova A., Mikova N., Foresight and STI Governance 2014 Vol. 8 No. 4 P. 64–83

Theoretical and applied studies about monitoring technology trends are carried out by organizations at global, national, sectoral, and corporate levels. Demand for them comes from the government, business, academic institutions, as well as the general public. Qualitative methods (expert interviews, surveys, workshops, etc.) play a significant role in large practical projects. At the same time, ...

Added: January 16, 2015

Breeds of cooccurrence: an attempt at classification

Roytberg M.A., Roytberg A.M., Khachko D. V., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.Т. 1: Основная программа конференции. Вып. 12 (19). М.: РГГУ, 2013. P. 568–578.

The paper proposes a substantial classification of collocates (pairs of words that tend to cooccur) along with heuristics that can help to attibute a word pair to a proper type automatically. The best studied type is frequent phrases, which includes idioms, lexicographic collocations, and syntactic selection. Pairs of this type are known to occur at a ...

Added: May 6, 2014

Relying on Discourse Trees to Extract Medical Ontologies from Text

Galitsky B., Ilvovsky D., Goncharova E., , in: Artificial Intelligence. RCAI 2021. Lecture Notes in Computer ScienceVol. 12948. Springer, 2021. P. 215–231.

Added: October 28, 2021

A heuristic to find initial values for stochastic local search in SAT using continuous extensions of Boolean formulas

Kascheev N. I., Путихин Н. С., , in: Proceedings of XV IEEE East-West Design & Test Symposium (EWDTS'2017). Piscataway: IEEE, 2017. P. 1–4.

Stochastic Local Search (SLS) is one of the most popular approaches to Boolean satisfiability problem and solvers based on this algorithm have made a substantial progress over the years. However, nearly all state of the art SLS solvers do not attempt to find a good starting point, instead using random values. We present a heuristic ...

Added: February 20, 2018

Методы классификации текстовых данных: можно ли потенциал количественного анализа использовать в качественном исследовании?

Aleksandrova M., ИНТЕРакция. ИНТЕРвью. ИНТЕРпретация 2021 Т. 13 № 2 С. 81–96

Text mining has developed rapidly in recent years. In this article, we compare classification methods that are suitable for solving problems of predicting item nonresponse. The author builds reasoning about how the analysis of textual data can be implemented in a wider research field based on this material. The author considers a number of metrics ...

Added: August 20, 2021

Что скрывает русский рэп? Тематическое моделирование текстов русскоязычной хип-хоп сцены

Бойченко А. Е., Zhuchkova S., Журнал социологии и социальной антропологии 2020 Т. 23 № 2 С. 130–165

Th e study presents an attempt of the complex exploratory analysis of Russian rap based on the corpus of texts of the Russian-language songs of this genre. Th e corpus contains more than 11,000 texts that vary in their date of creation and popularity by more than 500 artists collected by automatically extracting data from web ...

Added: August 12, 2020

Assessment of Dendritic Cell Therapy Effectiveness Based on the Feature Extraction from Scientific Publications

Luparov A., Panov A. I., Suvorov R. et al., , in: Proceedings of ICPRAM 2015 - 4th International Conference on Pattern Recognition Applications and MethodsVol. 2. SciTePress, 2015. P. 270–276.

Dendritic cells (DCs) vaccination is a promising way to contend cancer metastases especially in the case of immunogenic tumors. Unfortunately, it is only rarely possible to achieve a satisfactory clinical outcome in the majority of patients treated with a particular DC vaccine. Apparently, DC vaccination can be successful with certain combinations of features of the ...

Added: November 20, 2015

Technological Landscape of the Agriculture and Food Sector: A Long-Term Vision

Gokhberg L., Kuzminov I., Khabirova E., , in: Bio#Futures. Foreseeing and Exploring the Bioeconomy. Cham: Springer, 2021. Ch. 10 P. 203–227.

This chapter presents the overview of global challenges and trends, as well as technological landscape and future prospects for science, technology and innovation (STI) in agriculture and food sector. Our study is based on a systemic mapping of trends and technologies with the combination of big data analysis (text mining) and expert-based methods. The focus ...

Added: May 17, 2021

Using Intelligent Text Analysis of Online Reviews to Determine the Main Factors of Restaurant Value Propositions

Fainshtein E., Serova E., , in: Handbook of Research on Applied Data Science and Artificial Intelligence in Business and Industry. IGI Global, 2021. Ch. 10 P. 223–240.

Added: July 24, 2021

Forecasting technology trends using text mining of the gaps between science and technology: The case of perovskite solar cell technology

Li X., Xie Q., Daim T. et al., Technological Forecasting and Social Change 2019 Vol. 146 P. 432–449

How to detect and identify the future trends of emerging technologies as early as possible is crucial for government R&D strategic planning and enterprises' practices. To avoid the weakness of using only scientific papers or patents to study the development trends of emerging technologies, this paper proposes a framework that uses scientific papers and patents ...

Added: October 1, 2019

FCA Analyst Session and Data Access Tools in FCART

Neznanov A., Parinov A., , in: Artificial Intelligence: Methodology, Systems, and Applications 16th International Conference, AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. ProceedingsVol. 8722. Dordrecht, L., Cham, Heidelberg, NY: Springer, 2014. P. 214–221.

Formal Concept Analysis Research Toolbox (FCART) is an integrated environment for knowledge and data engineers with a set of research tools based on Formal Concept Analysis. FCART allows a user to load structured and unstructured data (including texts with various metadata) from heterogeneous data sources into local data storage, compose scaling queries for data snapshots, and then ...

Added: October 14, 2014

Formal concept analysis in knowledge processing: A survey on applications

Poelmans J., Ignatov D. I., Kuznetsov S. et al., Expert Systems with Applications 2013 Vol. 40 No. 16 P. 6538–6560

This is the second part of a large survey paper in which we analyze recent literature on Formal Concept Analysis (FCA) and some closely related disciplines using FCA. We collected 1072 papers published between 2003 and 2011 mentioning terms related to Formal Concept Analysis in the title, abstract and keywords. We developed a knowledge browsing ...

Added: October 3, 2013

Modern Natural Language Processing Technologies for Strategic Analytics

Kuzminov I., Bakhtin P. D., Timofeev A. et al., Scientific and Technical Information Processing 2021 Vol. 48 No. 6 P. 467–475

This paper provides an overview of the latest natural language processing (NLP) technologies that can be applied in strategic analytics. The main problems in this field and specific tasks that can be solved using NLP tools are investigated. The main areas of application of these tools are considered. Recent advancements in NLP are discussed and ...

Added: March 23, 2022

Value Propositions of Restaurant Delivery Systems: A Text Mining-Based Review.

Fainshtein E., , in: XIV International Scientific Conference “INTERAGROMASH 2021"Vol. 1: Precision Agriculture and Agricultural Machinery Industry. Springer, 2021. P. 475–483.

Due to the e-commerce rapid development during the COVID pandemic, the demand for logistics and its importance is increasing. A satisfied customer can drive e-commerce business forward. As logistical needs become more complex and logistics market becomes more competitive, service companies must strive to continually improve their value proposition to maintain their competitive edge. This ...

Added: November 1, 2021

Современные технологии обработки естественного языка для решения задач стратегической аналитики

Kuzminov I., Bakhtin P. D., Timofeev A. et al., Искусственный интеллект и принятие решений 2020 № 1 С. 3–16

The article is devoted to a review of the latest natural language processing (NLP) technologies that can be applied in strategic analytics. The introduction discusses the main problems in this area and specific tasks that can be solved using NLP tools. The article provides an overview of the main application areas in which these tools ...

Added: May 6, 2020

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin: Springer, 2014.

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Supplementary Proceedings of the 4th International Conference on Analysis of Images, Social Networks and Texts (AIST'2015)

Aachen: CEUR Workshop Proceedings, 2015.

This volume contains proceedings of the fourth conference on Analysis of Images, Social Networks and Texts (AIST’2015)1 . The first three conferences in 2012–2014 attracted a significant number of students, researchers, academics and engineers working on interdisciplinary data analysis of images, texts, and social networks. The broad scope of AIST makes it an event where ...

Added: October 9, 2015

Информационно-аналитическая система поддержки управления рисками в региональных производственных комплексах

Кутергина Г. В., Lyadova L. N., Фролова Н. В., В кн.: Applicable Information ModelsIssue 22. Sofia: ITHEA, 2011. С. 221–231.

В работе исследуется современное состояние проблемы управления рисками. Анализируются типичные недостатки существующих систем управления рисками в коммерческих организациях и причины, не позволяющие реализовать эффективные системы управления рисками в крупных производственных комплексах. Рассматриваются проблемы, связанные с отсутствием полноценной методологии управления рисками, которая базировалась бы на использовании современных информационных технологий, адекватных потребностям современного информационного общества по управлению ...

Added: May 24, 2013

Intangible-driven performance: Two decades searching for the Philosopher's Stone

Shakina E., Molodchik M., Parshakov P., Russian Management Journal 2020 Vol. 18 No. 3 P. 433–456

The study offers a structural literature review on the twenty years the evolution of the fast-growing research topic of intellectual capital (IC) and intangible-driven performance. Despite a rather short independent history, the IC concept has undergone a substantial transformation, bringing to the discussion vast empirical and methodological literature. Several endeavors carrying out literature review studies ...

Added: January 13, 2021

Интеллектуальный анализ текстов в социальных науках

Byzov A., Социология: методология, методы, математическое моделирование 2019 № 49 С. 131–160

Throughout most of their history, sociologists have sought to study unstructured organic texts: newspaper materials, diaries, memoirs, letters, documents, and, more recently, messages, publications and other texts on various online platforms. This article discusses how modern techniques of text mining can improve classical sociological approaches to the analysis of this type of data. The article ...

Added: December 9, 2019

Detecting and Validating Global Technology Trends Using Quantitative and Expert-Based Foresight Techniques

Kuzminov I., Bakhtin P. D., Khabirova E. et al., / NRU Higher School of Economics. Series WP BRP "Science, Technology and Innovation". 2018. No. 82.

This paper contributes to the conceptualisation and operationalisation of the “technology trend” discussion in the scope of the Technology Foresight paradigm. It proposes a consistent logical approach to analysing technology trends and increase predictive potential of futures studies. The approach integrates Big Data analysis into the Foresight studies’ toolset by means of applying text mining, ...

Added: October 3, 2018