Application of NLP Algorithms: Automatic Text Classifier Tool

A. Romanov; Ekaterina Kozlova; Lomotin Konstantin

doi:10.1007/978-3-030-02846-6_25

Publications

?

Application of NLP Algorithms: Automatic Text Classifier Tool

P. 310–323.

Romanov A., Ekaterina Kozlova, Lomotin Konstantin

This research is dedicated to the design of a decision support system for categorization of scientific literature. The purpose of this work is to research possible ways to apply the machine learning algorithms to the automation of manual text categorization. The following stages are considered: preprocessing of raw data, word embedding, model selection, classification model, and software design. At the first stage, in collaboration with VINITI RAS, the training set of 200,000 Russian texts was formed. At the second stage, the word embedding model was justified as Word2Vec vector representation from text matrix by “sum” convolution with dimensionality 1500. At the third stage, the quality of the classifiers was estimated, and the logistic regression algorithm with the highest F1 score (0.94) was selected. And at the final stage, the ATC (Automatic Text Classifier) application, which embeds the results obtained on the previous stages, was developed. The overall application structure was described. It consists of compact program modules that can be replaced or adapted to the incoming text and gain the most classification score.

Keywords: text analysis natural language processing decision tree support vector machines supervised learning Multilayer perceptron boosting decision support system

In book

Digital Transformation and Global Society. Third International Conference, DTGS 2018, St. Petersburg, Russia, 2018, Revised Selected Papers. Part II. Communications in Computer and Information Science 859

Issue 859. , Springer, 2018.

Объективация болезни: феномен реификации в цифровой психиатрии

Ugleva A. V., Вопросы философии 2025 № 11 С. 112–123

The article focuses on the phenomenon of reification in digital psychiatry. The author highlights that AI technologies exacerbate the problem of translating complex culturally-conditioned psychiatric constructs into formal mathematical structures, which creates an illusion of objectivity and impedes the development of personalized medical care. The main objective of the article is to minimize negative consequences ...

Added: November 6, 2025

Phase probabilities in first-order transitions using machine learning

Sukhoverkhova D., Vyacheslav Mozolenko, Shchur L., Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 2025 Vol. 112 No. 4 Article 044128

We set out to explore the possibility of investigating the critical behavior of systems with first-order phase transition using deep machine learning. We propose a machine learning protocol with ternary classification of instantaneous spin configurations using known values of disordered phase energy and ordered phase energy. The trained neural network is used to predict whether ...

Added: October 18, 2025

The Impact of Alternative Data on Default Probability: Analyzing the Italian E-commerce Sector with NLP and Network Structures

Bernhardt B. D., Marciano C., Guarracino M. R., Operations Research Forum 2025 Vol. 6 Article 47

E-commerce is a key sector in the Italian economy, with online companies becoming some of the largest and most profitable businesses. However, this growth comes with increased risk exposure. This study aims to investigate the relationship between alternative data (contextual factors, Text-Driven Data Enrichment) and the probability of default for Italian e-commerce companies. To date, ...

Added: September 6, 2025

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Юсуф-Ходжа и его братья: О родстве Афанасия Никитина

Lifshits A., Slovĕne 2025 Т. 14 № 1 С. 300–312

The article considers those episodes from the notes of Afanasy Nikitin that allow us to doubt his merchant status. Based on the analysis of grammar, vocabulary and pragmatics of Afanasy’s messages, it is concluded that he traveled along the Volga and further as the head of a small community of people and that he differed ...

Added: September 3, 2025

Predicting Systemic Risk in the Russian Financial Sector with Boosting Techniques

Shchepeleva M., Procedia Computer Science 2024 Vol. 242 P. 51–56

We test the predictive performance of different ensemble methods for forecasting systemic risk in Russia for the period 2008-2024. In contrast to the existing research on machine learning ensemble techniques, we find that conventional random forest works better for the Russian data. Based on this model, we additionally conduct variable importance analysis. We identify that ...

Added: June 17, 2025

Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?

Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84

Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...

Added: January 7, 2025

Latent heat estimation with machine learning

Sukhoverkhova D., Mozolenko V., Shchur L., / Series arXiv "math". 2024. No. 2411.00733.

Added: November 4, 2024

Semantic Text Analysis Using Artificial Neural Networks Based on Neural-Like Elements with Temporal Signal Summation

Kharlamov Alexander, Eugeny S., Kuznetsov D. et al., Problems of Artificial Intelligence 2023 No. 3(30) P. 4–27

Text as an image is analyzed in the human visual analyzer. In this case, the image is scanned along the points of the greatest informativity, which are the inflections of the contours of the equitextural areas, into which the image is roughly divided. In the case of text analysis, individual characters of the alphabet are ...

Added: October 20, 2024

Cross-country analysis of science, technology and innovation policies: non-covid-19 related and Covid-19 specific STI policies in OECD countries

Russo M., Pavone P., Meissner D. et al., Quality and Quantity 2024 P. 1–25

In OECD countries, Science, Technology and Innovation (STI) policies were seen as key aspects of coping with the Covid-19 pandemic. Now that the pandemic is over, identifying which policy mix portfolios characterised countries in terms of their non-Covid-19 related and Covid-19 specific STI policies fills a knowledge gap on changes in STI policies induced by ...

Added: September 27, 2024

Parameter-Efficient Tuning of Transformer Models for Anglicism Detection and Substitution in Russian

Daniil Lukichev, Kryanina Darya, Anastasia Bystrova et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 295–306.

Added: April 25, 2024

Системы поддержки принятия решений: учебник и практикум для вузов. – 2-е изд., перераб. и доп.

Kravchenko T. K., Isaev D., Юрайт, 2024.

В учебнике рассматриваются вопросы информатизации процессов принятия решений: постановка задачи, типовые этапы, подходы к моделированию условий принятия решений, а также последствий выбора различных вариантов. Рассматривается роль экспертных оценок, которые используются: для определения вероятностей возникновения проблемных ситуаций; определения коэффициентов компетентности экспертов, оценивающих альтернативы; формирования оценок рассматриваемых альтернатив. Выделяются особенности принятия групповых решений. Особое внимание уделено поддержке принятия решений на ...

Added: April 14, 2024

Machine learning approach for scientific and technical expertise

A. V. Belov, E. A. Egorova, Bulletin D. Serikbayev East Kazakhstan Technical University 2023 No. 4 P. 92–102

When conducting scientific and technical expertise, it is necessary to analyze the texts of reports on scientific research work. The analysis is carried out in order to determine whether the research being conducted belongs to the class of scientific research and development work in the field of IT. This article discusses the tasks of binary ...

Added: March 9, 2024

Use of Text Skeleton Structures for the Development of Semantic Search Methods

A. V. Mylnikova, V. A. Trusov, L. A. Mylnikov, Automatic Documentation and Mathematical Linguistics 2023 Vol. 57 No. 5 P. 301–307

This paper considers the problem of the generation of descriptors to reduce data volumes, text data resources, and search times through the use of the new factors of authorship, region, emotive meaning, and popularity, as well as a text category without special marks that can be used to generate descriptors. This approach allows the use ...

Added: February 29, 2024

Explainable Document Classification via Pattern Structures

Sergei O. Kuznetsov, Parakal E. G., Lecture Notes in Networks and Systems 2023 Vol. 776 P. 423–434

Inherently explainable Machine Learning (ML) models are able to provide explanations for their predictions by virtue of their construction. The explanations of a ML model are more comprehensible if they are expressed in terms of its input features. Our paper proposes an inherently explainable pipeline for document classification using pattern structures and Abstract Meaning Representation ...

Added: February 5, 2024

Business Process Management Workshops. BPM 2023 International Workshops, Utrecht, The Netherlands, September 11–15, 2023, Revised Selected Papers

Switzerland: Springer, 2024.

This book constitutes revised papers from the International Workshops held at the 21st International Conference on Business Process Management, BPM 2023, in Utrecht, The Netherlands, during September 2023. Papers from the following workshops are included: • 7th International Workshop on Artificial Intelligence for Business Process Management (AI4BPM 2023) • 7th International Workshop on Business Processes Meet Internet-of-Things (BP-Meet-IoT ...

Added: January 17, 2024

Проект Chekhov Digital: задачи и проблемы реализации семантической разметки текстов (на примере рассказа А. П. Чехова «Смерть чиновника»)

Северина Е. М., Ларионова М. Ч., Litera 2023 № 10 С. 211–222

The article considers a model of preparation of machine-readable (semantic) markup of texts for the Chekhov Digital project on the example of philological interpretation of individual significant elements of A. P. Chekhov's story "Death of an Official" and presentation of this information explicitly based on the standards of digital publication Text Encoding Initiative (TEI/XML). Based ...

Added: January 12, 2024

РАЗРАБОТКА СИСТЕМЫ ГЕНЕРАЦИИ ПОВСЕДНЕВНЫХ ДИАЛОГОВ НА РУССКОМ ЯЗЫКЕ: ПИЛОТНОЕ ИССЛЕДОВАНИЕ

Кругликова В. Г., В кн.: Анализ речи: теоретические и прикладные аспекты: сборник научных статей.: [б.и.], 2023.

The article presents a comparative analysis of various language models used to generate texts and evaluates their effectiveness for the task of generating conversational speech. There are such models as GPT-3, BERT, LSTM involved in the comparative analysis. This study is part of a project of developing a system for generating dialogues in Russian. The ...

Added: December 10, 2023

Investor sentiment and the NFT hype index: to buy or not to buy?

Baklanova V., Kurkin A., Teplova T., China Finance Review International 2024 Vol. 14 No. 3 P. 522–548

Purpose – The primary objective of this research is to provide a precise interpretation of the constructed machine learning model and produce definitive summaries that can evaluate the influence of investor sentiment on the overall sales of non-fungible token (NFT) assets. To achieve this objective, the NFT hype index was constructed as well as several approaches of ...

Added: December 10, 2023

Think about what you’ve learned: анализ тональности для моделирования пользовательского опыта в сфере онлайн-образования

Kirina M., Человек: образ и сущность. Гуманитарные аспекты 2024 № 2(58) С. 176–204

The article focuses on the application of opinion mining techniques to evaluate user experience on the Hyperskill educational platform, using Python, Java, and Kotlin programming projects as the basis of analysis. The study utilizes sentiment analysis and keyword extraction methods to gauge users' attitudes towards the platform, learning process, and topics covered. To achieve this, ...

Added: December 9, 2023