Generalized approach to sentiment analysis of short text messages in natural language processing

E. V. Polyakov; L. Voskov; P. Abramov; Polyakov S. V.

doi:10.31799/1684-8853-2020-1-2-14

Publications

?

Generalized approach to sentiment analysis of short text messages in natural language processing

Informatsionno-upravliaiushchie sistemy [Information and Control Systems]. 2020. No. 1. P. 2–14.

Polyakov E. V., Voskov L., Abramov P., Polyakov S. V.

Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study and amount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformations and their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models. Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing through he stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, and modeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deep learning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processing have demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increase in quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that the use of automatic machine learning which uses classical classification algorithms is comparable in quality to manual model development; however, it takes much longer. The use of transfer learning has a small but positive effect on the classification quality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under development in natural language processing problems.

Research target: Computer Science

Priority areas: IT and mathematics

Keywords: modeling natural language processing machine learning deep learning vectorization preprocessing automatic machine learning transfer learning

QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks

Neal N. X., Weiqing L., Dacheng H. et al., Algorithms 2026 Vol. 19 No. 5 P. 1–22

In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...

Added: May 13, 2026

Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.

The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...

Added: May 12, 2026

Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами

Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110

Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...

Added: May 12, 2026

Connected and Automated Vehicle Scenario Manager Graphical User Interface

Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547

High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...

Added: May 11, 2026

Proceedings 2026 IEEE 11th International Conference on Smart Cloud SmartCloud 2026 8-10 May 2026

Los Alamitos: IEEE Computer Society, 2026.

It is a great pleasure for us to welcome you on behalf of the conference committees, to the 11th IEEE International Conference on Smart Cloud (IEEE SmartCloud 2026), we are glad that we can have this international conference in New York city, USA. Now, please allow us to introduce the IEEE SmartCloud 2026 conference. The ...

Added: May 10, 2026

Digital twin framework for liquidity management: Bridging the gap between theory and operations

Zanko G., Nazarova V., MULTIDISCIPLINARY SCIENCE JOURNAL 2026 Vol. 8 No. 10 P. e2026780

The corporate cash management problem (CMP) constitutes a critical operational challenge focused on determining the optimal sequence of transactions between liquid cash holdings and alternative investment assets. The primary objective is to maintain sufficient liquidity to meet obligations while simultaneously minimizing the aggregate costs associated with idle capital and transaction execution. Despite profound theoretical advancements ...

Added: May 10, 2026

От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)

Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194

With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...

Added: May 8, 2026

Explainable AI for Industry 5.0: Shedding light on the black box

Avdoshin S. M., Pesotskaya E. Y., Business Informatics 2026 Vol. 20 No. 1 P. 7–28

The rapid development of artificial intelligence (AI) is accompanied by increasing computational complexity and decreasing model transparency, which significantly limits its adoption in critical domains that require a high level of trust, interpretability, and justification of decisions. Under these conditions, the field of Explainable Artificial Intelligence (XAI) has gained particular importance as it focuses on approaches and technologies that ...

Added: May 8, 2026

Comparative Analysis of Students’ Perceptions of Programming Puzzles: Parson’s and Wordle-Like

Varnavsky A., IEEE Access 2026 Vol. 14 P. 37487–37508

Puzzles are an excellent tool for learning computer science and programming, fostering increased interest, engagement, and motivation among students, as well as developing logical, critical, and computational thinking. Among beginner programmers, Parson's Programming Puzzles are quite popular, aimed at mastering the basic syntactic and logical constructs of programming languages. However, as students' skills grow, their ...

Added: May 7, 2026

Towards performance analysis of GPU-aware MPI over Angara interconnect

Ismagilov T., Mukosey A., Smirnov F. et al., International Journal of High Performance Computing Applications 2026 Vol. 40 No. 2 P. 240–253

One of the most important aspects of supercomputer development in the post-Moore era is the interconnect technologies that allow one to unite a multitude of processing elements into a well-synchronized computing system. Novel types of supercomputer interconnect require careful benchmarking and compliance with the requirements of modern hardware trends. GPU-based heterogeneous computing is one of ...

Added: May 7, 2026

Программные инструментальные средства для разработки мероприятий по снижению брака серийного производства

Yasnitsky L., Голдобин М. А., Мезенцев А. С., Прикладная математика и вопросы управления 2025 № 2 С. 99–116

Представлен обзор современных методов и основанных на них программных инструментах, применяемых для математического моделирования серийных производственных процессов с целью снижения брака и повышения качества производимых изделий. Перечисляются группы работ, нацеленных на обнаружение и классификацию дефектов, работ, в которых решаются задачи прогнозирования образования дефектов и определения значимости параметров, работ направленных на поиск оптимального сочетания технологических параметров изготовления изделий, ...

Added: May 5, 2026

Моделирование и оценка ресурсных затрат алгоритмов маршрутизации в сетях на кристалле с двумерной циркулянтной топологией

Монахова Э. А., Монахов О. Г., Rzaev E. et al., Прикладная дискретная математика 2026 Т. 71 С. 112–127

В настоящей работе исследовано совместное конструирование топологий семейств оптимальных по диаметру циркулянтных сетей $C(N; \pm 1, \pm s_2)$ и реализуемых для них оптимальных алгоритмов маршрутизации сложности $O(1)$. Предлагаемый алгоритм маршрутизации основан на использовании масштабируемых параметров $L$-образных шаблонов плотной укладки графов на плоскости для семейств оптимальных сетей. Определены аналитические формулы зависимости этих параметров от диаметра графов семейств ...

Added: May 4, 2026

AlphaDent: A dataset for automated tooth pathology detection

Sosnin E. I., Vasil’ev Y. L., Solovyev R. A. et al., Computer Optics 2025 Vol. 49 No. 6 P. 1129–1137

In this article, we present a new unique dataset for dental research – AlphaDent. This dataset is based on the DSLR camera photographs of the teeth of 295 patients and contains over 1200 images. The dataset is labeled for solving the instance segmentation problem and is divided into 9 classes. The article provides a detailed ...

Added: May 4, 2026

Мультимодальные модели в медицинской диагностике как универсальный инструмент

Назаренко А. Г., Федоров М. В., Moshkin A. et al., Вестник Росздравнадзора 2026 № 1 С. 14–29

Multimodal foundation models and medical multimodal large language models are establishing a new class of diagnostic clinical decision support systems capable of operating on heterogeneous data sources, including medical imaging (X-ray, CT, MRI, ultrasound, histopathology), physiological signals (ECG, EEG), clinical text (electronic health records, reports, discharge summaries), laboratory measurements, molecular profiling data, and related modalities. ...

Added: May 4, 2026

2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

Honolulu: IEEE, 2025.

International Conference on Computer Vision Workshops (ICCVW), Honolulu, HI, USA, 2025 ...

Added: May 3, 2026

MinMAE calibration method for convolutional neural network quantization

Vasilev A., Kapitanov A., Solovyev Roman A. et al., PeerJ Computer Science 2026 Vol. 12 Article 3724

This article introduces MinMAE, a novel activation calibration method for Post-Training Quantization (PTQ) that significantly reduces accuracy loss in Convolutional Neural Networks (CNN). Motivated by the need for high-fidelity quantization without costly retraining, MinMAE directly minimizes the Mean Absolute Error (MAE) between original and dequantized activations, making it robust to outliers that degrade standard methods. ...

Added: May 3, 2026

Machine Learning Methods for Fast Evaluation of Static IR Drop Effect

Solovyev Roman A., Telpukhov Dmitry, Shafeev I. et al., Technologies 2026 Vol. 14 No. 3 Article 169

With the continuous scaling of semiconductor design technologies, evaluating static IR drop has become a critical bottleneck in the physical synthesis flow. This paper presents a machine learning-based framework that transforms the power delivery network (PDN) analysis problem into an image-to-image translation task using a U-Net architecture with MaxViT and EfficientNet encoders. By implementing a ...

Added: May 3, 2026

On the minimum number of maximal distance-k independent sets in trees

Taletskii D., / Series arXiv "math". 2026.

A vertex subset of a graph is called a \textit{distance-$k$ independent set} if the distance between any two of its distinct vertices is at least $k + 1$. For all $n,k \geq 1$, we determine the minimum possible number of inclusion-wise maximal distance-$k$ independent sets among all $n$-vertex trees. It equals~$n$ if $n \leq k ...

Added: May 1, 2026

Proceedings of the 2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)

Dayoub A., Suleiman E., IEEE, 2026.

2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) 1-3 April 2026 ...

Added: April 30, 2026

Интеллектуальный анализ данных в нефтегазовой отрасли

М.: ООО «Геомодель Развитие», 2024.

Интелшектуальный анализ данных в нефтегазовой отрасли, Калининград, Россия, 2024, ООО «Геомодель Развитие» ...

Added: April 29, 2026

Bioinspired Method of Agent Redistribution between Groups

Karpova Irina Petrovna, Pattern Recognition and Image Analysis 2025 Vol. 35 No. 4 P. 1138–1144

A solution to the problem of redistributing agents between groups based on simulating a form of social parasitism in ants known as slave-making is considered. To provide a comprehensive solution, the problem is integrated with a method of orientation based on visual landmarks and a compass, including route memorization and return. The models and mechanisms ...

Added: April 29, 2026

Natural hazard database from Internet publications: text mining with a large language model

Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.

Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...

Added: April 28, 2026

Machine Learning Approach to Anticancer Activity Prediction of Transition-Metal Complexes Based on a Large-Scale Experimental Database

Krasnov L., Malikov D., Kiseleva M. et al., Journal of Medicinal Chemistry 2026 Vol. 69 No. 8 P. 8838–8851

In this work, we developed a straightforward data-driven approach to predict the cytotoxicity of metal complexes based entirely on their (metal + ligands) composition. To this end, we have manually curated MetalCytoToxDB─a comprehensive experimental database comprising 26,500 IC50 values for 7050 metal complexes against 754 cell lines from 1921 articles. Based on these, machine learning ...

Added: April 23, 2026

Ising models on the hydrogen peroxide and other lattices

Qin X., Deng Y., Shchur L. et al., / Series arXiv "math". 2026. No. 2603.02962.

We perform a Monte Carlo analysis of the Ising model on many three-dimensional lattices. By means of finite-size scaling we obtain the critical points and determine the scaling dimensions. As expected, the critical exponents agree with the three-dimensional Ising universality class for all models. The irrelevant field, as revealed by the correction-to-scaling amplitudes, appears to ...

Added: April 20, 2026