• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Application of NLP Algorithms: Automatic Text Classifier Tool
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 11, 2026
Mathematicians from Nizhny Novgorod and Shanghai Study System Stability
Mathematicians at HSE University–Nizhny Novgorod, in collaboration with colleagues from Tongji University in Shanghai, are investigating the fundamental causes of structural stability in systems and the mechanisms underlying its disruption. In this interview with the HSE News Service, Prof. Olga Pochinka, Head of the International Laboratory of Dynamical Systems and Applications at HSE University–Nizhny Novgorod and leader of the project ‘Qualitative Theory of Systems of Ordinary and Partial Differential Equations,’ discusses the project, which is being implemented as part of HSE University's International Academic Cooperation programme.
June 11, 2026
Neurolinguists Assist in Awake Surgery on 11-Year-Old Patient with Epilepsy
Researchers at the HSE Centre for Language and Brain took part in a rare awake neurosurgical procedure performed on an 11-year-old patient with drug-resistant epilepsy. Working alongside surgeons at the Voyno-Yasenetsky Centre of Specialised Medical Care for Children in Solntsevo, they monitored the resection of a portion of the left temporal lobe, where the epileptic focus had been identified.
June 11, 2026
Scientists Explain How Emotions Shape Attitudes Toward Digital Governance
Today, interactions between citizens and government increasingly take place through digital governance platforms, including digital public services, AI-powered systems, and algorithmic decision-making tools. Until now, however, these technologies have largely been viewed as technical instruments, with their effectiveness assessed primarily in terms of efficiency and user-friendliness. The authors of a new study propose a broader perspective, arguing that digital governance should also be understood as an emotional experience that directly shapes citizens' trust in public institutions.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Application of NLP Algorithms: Automatic Text Classifier Tool

P. 310–323.
Romanov A., Ekaterina Kozlova, Lomotin Konstantin

This research is dedicated to the design of a decision support system for categorization of scientific literature. The purpose of this work is to research possible ways to apply the machine learning algorithms to the automation of manual text categorization. The following stages are considered: preprocessing of raw data, word embedding, model selection, classification model, and software design. At the first stage, in collaboration with VINITI RAS, the training set of 200,000 Russian texts was formed. At the second stage, the word embedding model was justified as Word2Vec vector representation from text matrix by “sum” convolution with dimensionality 1500. At the third stage, the quality of the classifiers was estimated, and the logistic regression algorithm with the highest F1 score (0.94) was selected. And at the final stage, the ATC (Automatic Text Classifier) application, which embeds the results obtained on the previous stages, was developed. The overall application structure was described. It consists of compact program modules that can be replaced or adapted to the incoming text and gain the most classification score.

Language: English
Full text
DOI
Text on another site
Keywords: text analysisnatural language processingdecision treesupport vector machinessupervised learningMultilayer perceptronboostingdecision support system

In book

Digital Transformation and Global Society. Third International Conference, DTGS 2018, St. Petersburg, Russia, 2018, Revised Selected Papers. Part II. Communications in Computer and Information Science 859
Issue 859. , Springer, 2018.
Similar publications
Анализ культурных референций в творчестве А. Вознесенского: цифровое исследование имен персоналий
Tyuryakova-Matveeva D., Цифровые гуманитарные исследования 2026 № 1 С. 4–26
The article explores cultural references in the works of Andrei Voznesensky by analyzing the personalities he mentions. A total of 1,678 works were processed, including poetry, prose, and early unpublished poems. NER methods based on Natasha, spaCy, and LLM Grok tools made it possible to study the frequency of mentions of famous people and their ...
Added: May 31, 2026
Перспективы медиа-мониторинга в исследованиях общественного мнения (на примере доверия президенту)
Ankudinov I., Социология: методология, методы, математическое моделирование 2025 № 61 С. 165–203
The changing political mood of Russians is a constant subject of interest for sociological agencies. With the development of the Internet, conventional questionnaire research began to be supplemented by online surveys and, despite some skepticism, by social media mining. This article attempts to adjust an accidental web-sample so as to bring its estimates closer to ...
Added: April 22, 2026
Алгоритм анализа новостной информации для принятия экономических решений
Чудинова О. С., Первицкая Л. А., Ramenskaya A., Индустриальная экономика 2026 № 1 С. 65–78
This article is devoted to the development of an algorithm for analyzing news information using machine learning methods implemented in Python libraries. The choice of tools used at each stage of the algorithm is justified by calculating metrics for the quality of the solution to the corresponding machine learning problems. The algorithm’s results are presented ...
Added: April 20, 2026
RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
Объективация болезни: феномен реификации в цифровой психиатрии
Ugleva A. V., Вопросы философии 2025 № 11 С. 112–123
The article focuses on the phenomenon of reification in digital psychiatry. The author highlights that AI technologies exacerbate the problem of translating complex culturally-conditioned psychiatric constructs into formal mathematical structures, which creates an illusion of objectivity and impedes the development of personalized medical care. The main objective of the article is to minimize negative consequences ...
Added: November 6, 2025
Phase probabilities in first-order transitions using machine learning
Sukhoverkhova D., Vyacheslav Mozolenko, Shchur L., Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 2025 Vol. 112 No. 4 Article 044128
We set out to explore the possibility of investigating the critical behavior of systems with first-order phase transition using deep machine learning. We propose a machine learning protocol with ternary classification of instantaneous spin configurations using known values of disordered phase energy and ordered phase energy. The trained neural network is used to predict whether ...
Added: October 18, 2025
The Impact of Alternative Data on Default Probability: Analyzing the Italian E-commerce Sector with NLP and Network Structures
Bernhardt B. D., Marciano C., Guarracino M. R., Operations Research Forum 2025 Vol. 6 Article 47
E-commerce is a key sector in the Italian economy, with online companies becoming some of the largest and most profitable businesses. However, this growth comes with increased risk exposure. This study aims to investigate the relationship between alternative data (contextual factors, Text-Driven Data Enrichment) and the probability of default for Italian e-commerce companies. To date, ...
Added: September 6, 2025
Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions
Chepikov I., Karpov I., , in: 26th International Conference, AIED 2025, Palermo, Italy, July 22–26, 2025, Proceedings, Part I. Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.
Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...
Added: September 4, 2025
Юсуф-Ходжа и его братья: О родстве Афанасия Никитина
Lifshits A., Slovĕne 2025 Т. 14 № 1 С. 300–312
The article considers those episodes from the notes of Afanasy Nikitin that allow us to doubt his merchant status. Based on the analysis of grammar, vocabulary and pragmatics of Afanasy’s messages, it is concluded that he traveled along the Volga and further as the head of a small community of people and that he differed ...
Added: September 3, 2025
Predicting Systemic Risk in the Russian Financial Sector with Boosting Techniques
Shchepeleva M., Procedia Computer Science 2024 Vol. 242 P. 51–56
We test the predictive performance of different ensemble methods for forecasting systemic risk in Russia for the period 2008-2024. In contrast to the existing research on machine learning ensemble techniques, we find that conventional random forest works better for the Russian data. Based on this model, we additionally conduct variable importance analysis. We identify that ...
Added: June 17, 2025
Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?
Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84
Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...
Added: January 7, 2025
Latent heat estimation with machine learning
Sukhoverkhova D., Mozolenko V., Shchur L., / Series arXiv "math". 2024. No. 2411.00733.
We set out to explore the possibility of investigating the critical behavior of systems with first-order phase transition using deep machine learning. We propose a machine learning protocol with ternary classification of instantaneous spin configurations using known values of disordered phase energy and ordered phase energy. The trained neural network is used to predict whether ...
Added: November 4, 2024
Semantic Text Analysis Using Artificial Neural Networks Based on Neural-Like Elements with Temporal Signal Summation
Kharlamov Alexander, Eugeny S., Kuznetsov D. et al., Problems of Artificial Intelligence 2023 No. 3(30) P. 4–27
Text as an image is analyzed in the human visual analyzer. In this case, the image is scanned along the points of the greatest informativity, which are the inflections of the contours of the equitextural areas, into which the image is roughly divided. In the case of text analysis, individual characters of the alphabet are ...
Added: October 20, 2024
Cross-country analysis of science, technology and innovation policies: non-covid-19 related and Covid-19 specific STI policies in OECD countries
Russo M., Pavone P., Meissner D. et al., Quality and Quantity 2025 Vol. 59 No. Suppl 1 P. S343–S367
In OECD countries, Science, Technology and Innovation (STI) policies were seen as key aspects of coping with the Covid-19 pandemic. Now that the pandemic is over, identifying which policy mix portfolios characterised countries in terms of their non-Covid-19 related and Covid-19 specific STI policies fills a knowledge gap on changes in STI policies induced by ...
Added: September 27, 2024
Parameter-Efficient Tuning of Transformer Models for Anglicism Detection and Substitution in Russian
Daniil Lukichev, Kryanina Darya, Anastasia Bystrova et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 295–306.
Added: April 25, 2024
Системы поддержки принятия решений: учебник и практикум для вузов. 2-е издание, переработанное и дополненное
Kravchenko T. K., Isaev D., Юрайт, 2024.
В учебнике рассматриваются вопросы информатизации процессов принятия решений: постановка задачи, типовые этапы, подходы к моделированию условий при­нятия решений, а также последствий выбора различных вариантов. Рассматривается роль экспертных оценок, которые используются: для определе­ния вероятностей возникновения проблемных ситуаций; определения коэффициен­тов компетентности экспертов, оценивающих альтернативы; формирования оценок рассматриваемых альтернатив. Выделяются особенности принятия групповых решений. Особое внимание уделено поддержке принятия решений на ...
Added: April 14, 2024
Machine learning approach for scientific and technical expertise
A. V. Belov, E. A. Egorova, Bulletin D. Serikbayev East Kazakhstan Technical University 2023 No. 4 P. 92–102
When conducting scientific and technical expertise, it is necessary to analyze the texts of reports on scientific research work. The analysis is carried out in order to determine whether the research being conducted belongs to the class of scientific research and development work in the field of IT. This article discusses the tasks of binary ...
Added: March 9, 2024
Use of Text Skeleton Structures for the Development of Semantic Search Methods
A. V. Mylnikova, V. A. Trusov, L. A. Mylnikov, Automatic Documentation and Mathematical Linguistics 2023 Vol. 57 No. 5 P. 301–307
This paper considers the problem of the generation of descriptors to reduce data volumes, text data resources, and search times through the use of the new factors of authorship, region, emotive meaning, and popularity, as well as a text category without special marks that can be used to generate descriptors. This approach allows the use ...
Added: February 29, 2024
Explainable Document Classification via Pattern Structures
Sergei O. Kuznetsov, Parakal E. G., Lecture Notes in Networks and Systems 2023 Vol. 776 P. 423–434
Inherently explainable Machine Learning (ML) models are able to provide explanations for their predictions by virtue of their construction. The explanations of a ML model are more comprehensible if they are expressed in terms of its input features. Our paper proposes an inherently explainable pipeline for document classification using pattern structures and Abstract Meaning Representation ...
Added: February 5, 2024
Business Process Management Workshops. BPM 2023 International Workshops, Utrecht, The Netherlands, September 11–15, 2023, Revised Selected Papers
Switzerland: Springer, 2024.
This book constitutes revised papers from the International Workshops held at the 21st International Conference on Business Process Management, BPM 2023, in Utrecht, The Netherlands, during September 2023. Papers from the following workshops are included: • 7th International Workshop on Artificial Intelligence for Business Process Management (AI4BPM 2023) • 7th International Workshop on Business Processes Meet Internet-of-Things (BP-Meet-IoT ...
Added: January 17, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit