Generalized approach to sentiment analysis of short text messages in natural language processing
Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study and amount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformations and their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models. Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing through he stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, and modeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deep learning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processing have demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increase in quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that the use of automatic machine learning which uses classical classification algorithms is comparable in quality to manual model development; however, it takes much longer. The use of transfer learning has a small but positive effect on the classification quality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under development in natural language processing problems.