Comparative analysis of classification methods for text in UDC code generation problem for scientific articles

Lomotin K. E.; Kozlova E. S.; A. Romanov

?

Comparative analysis of classification methods for text in UDC code generation problem for scientific articles

P. 359–363.

Lomotin K. E., Kozlova E. S., Romanov A.

The research is devoted to studying of applicability of most relevant modern classification methods to the issue of automatic universal decimal classificator code generation for arbitrary scientific article. The next methods are considered as classifiers: artificial neural network, logistic regression, naive Bayesian classifier and metrical

Language: English

Full text

Text on another site

Keywords: natural language processing machine learning text classification UDC

In book

Information Innovative Technologies: Materials of the International scientific–рractical conference

M.: Association of graduates and employees of AFEA named after prof. Zhukovsky, 2017.

Authorship Attribution in Russian with New High-Performing and Fully Interpretable Morpho-Syntactic Features

Pimonova E., Durandin O., Malafeev A., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Lecture Notes in Computer Science, Revised Selected PapersVol. 11832.: Cham: Springer, 2019. P. 193–204.

This work tackles the problem of modeling author style in Russian. In particular, we solve the task of authorship attribution using the collected dataset of 30 authors, 1506 texts written in the period of 18th – 21st century. We apply various approaches to solving the attribution problem: Random Forest, Logistic Regression, SVM Classifier. In terms ...

Added: November 7, 2019

Применение методов машинного обучения для решения задачи автоматической рубрикации статей по УДК

Romanov A., Ломотин К. Е., Козлова Е. С., Информационные технологии 2017 Т. 23 № 6 С. 418–423

The paper deals with the applicability of modern machine learning methods to the problem of automatic generation of UDC for scientific articles. As the classifiers, such models as artificial neural networks, logistic regression and boosting are considered. Graph algorithms and a prototype software module to generate UDC are designed. ...

Added: July 30, 2017

Classification of Short Scientific Texts

I. K. Kusakin, Fedorets O. V., A. Y. Romanov, Scientific and Technical Information Processing 2023 Vol. 50 No. 3 P. 176–183

This paper discusses modern approaches to natural language processing and the application of machine learning models to the task of classifying short scientific texts in Russian. This study is devoted to the analysis of methods for vectorization of textual information, selection of a model for scientific paper clas- sification, and training of linguistic model BERT ...

Added: November 4, 2023

Pulse of the Nation: Observable Subjective Well-Being in Russia Inferred from Social Network Odnoklassniki

Sergey Smetanin, Mathematics 2022 Vol. 10 No. 16 Article 2947

Policymakers and researchers worldwide are interested in measuring the subjective well-being (SWB) of populations. In recent years, new approaches to measuring SWB have begun to appear, using digital traces as the main source of information, and show potential to overcome the shortcomings of traditional survey-based methods. In this paper, we propose the formal model for ...

Added: August 15, 2022

Analysis of Images, Social Networks and Texts Third International Conference, AIST 2014, Yekaterinburg, Russia, April 10-12, 2014, Revised Selected Papers

Berlin: Springer, 2014.

This book constitutes the proceedings of the Third International Conference on Analysis of Images, Social Networks and Texts, AIST 2014, held in Yekaterinburg, Russia, in April 2014. The 11 full and 10 short papers were carefully reviewed and selected from 74 submissions. They are presented together with 3 short industrial papers, 4 invited papers and ...

Added: November 13, 2014

Multiple features for clinical relation extraction: A machine learning approach

Alimova l., Tutubalina E., Journal of Biomedical Informatics 2020 Vol. 103 P. 1–9

Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding ...

Added: October 28, 2020

Aspect-Based Sentiment Analysis of Russian Hotel Reviews

Рыбаков В. В., Malafeev A., , in: Supplementary Proceedings of the 7th International Conference on Analysis of Images, Social Networks and Texts (AIST-SUP 2018), Moscow, Russia, July 5-7, 2018.: Aachen: CEUR Workshop Proceedings, 2018. Ch. 8 P. 75–84.

The paper presents an attempt to solve the task of aspect-based sentiment analysis in the domain of Russian-language hotel reviews, using distributed representation of words. The authors follow an approach similar to [Blinov, Kotelnikov, 2014], but applied to a different domain and using different parameters. The authors also present a new dataset that is made ...

Added: February 15, 2019

Rewriting the Rules: LLMs Vs. Traditional ML in University Admissions

Chepikov I., Karpov I., , in: Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium, Blue Sky, and WideAIED.: Springer, 2025. P. 352 – 358.

Modern LLM models such as BERT, ChatGPT, DeepSeek have shown great potential in solving various tasks, including text classification, text generation, analysis and summary of documents. In this paper, we show that these models close to classical ML approaches based on decision trees not only in text processing, but also in processing classical tabular data ...

Added: September 4, 2025

Russian Q&A Method Study: From Naive Bayes to Convolutional Neural Networks

Nikolaev K., Malafeev A., , in: Analysis of Images, Social Networks and Texts. 7th International Conference AIST 2018.: Springer, 2018. Ch. 12 P. 121–126.

This paper deals with automatic classification of questions in the Russian language. In contrast to previously used methods, we introduce a convolutional neural network for question classification. We took advantage of an existing corpus of 2008 questions, manually annotated in accordance with a pragmatic 14-class typology. We modified the data by reducing the typology to ...

Added: February 15, 2019

Findings of the Association for Computational Linguistics: EMNLP 2022

Association for Computational Linguistics, 2022.

Findings of the Association for Computational Linguistics conference EMNLP 2022. ...

Added: February 17, 2023

Research of heuristic approaches for determining the tonality of text messages in natural language processing problems

Polyakov E. V., Polyakov S. V., Abramov P., , in: Proceedings of 2019 XVI International Symposium "Problems of Redundancy in Information and Control Systems" (REDUNDANCY).: IEEE, 2019. P. 159–164.

Determining the tonality of the text is a difficult task, the solution of which essentially depends on the context, the field of study and the amount of text data. The analysis shows that the authors in their works do not jointly use the full range of possible transformations on the data and their combinations. The ...

Added: September 20, 2020

Style transfer in NLP: a framework and multilingual analysis with Friends TV series

Tikhonova M., Elina Telesheva, Mirzoev S. et al., , in: 2021 International Conference Engineering and Telecommunication (En&T).: IEEE, 2022. P. 1–6.

Style transfer is an important and a rapidly developing of Natural Language Processing. This days more and more methods and models are proposed which allow us to generate text in predefined style. In this paper we propose a framework for style transfer of “Friends” TV series. The trained models are able to mimic one of ...

Added: May 21, 2022

Using Probability Distribution over Classes in Automatically Obtained Training Corpora

Durandin O., Hilal N., Strebkov D. et al., , in: Proceedings of the ISMW-FRUCT 2016.: [б.и.], 2016. P. 90–93.

The paper contains a take on the classification problem variation featuring class noise where each object in the training set is associated with a probability distribution over the class label set instead of a particular class label. That type of task was illustrated on the complex natural language processing problem – automatic Arabic dialect classification. ...

Added: January 17, 2017

A Deep Learning Method Study of User Interest Classification

Malafeev A., Nikolaev K., , in: Analysis of Images, Social Networks and Texts. 8th International Conference, AIST 2019, Kazan, Russia, July 17–19, 2019, Revised Selected Papers. Communications in Computer and Information ScienceVol. 1086.: Springer, 2020. P. 154–159.

In this paper, a deep learning method study is conducted to solve a new multiclass text classification problem, identifying user interests by text messages. We used an original dataset of almost 90 thousand forum text messages, labeled for ten interests. We experimented with different modern neural network architectures: recurrent and convolutional, as well as simpler ...

Added: November 7, 2019

Artificial Intelligence. RCAI 2021. Lecture Notes in Computer Science

Springer, 2021.

This book constitutes the proceedings of the 19th Russian Conference on Artificial Intelligence, RCAI 2021, held in Moscow, Russia, in October 2021. The 19 full papers and 7 short papers presented in this volume were carefully reviewed and selected from 80 submissions. The conference deals with a wide range of topics, categorized into the following topical ...

Added: October 28, 2021

Recent Trends in Analysis of Images, Social Networks and Texts. 9th International Conference, AIST 2020, Skolkovo, Moscow, Russia, October 15–16, 2020 Revised Supplementary Proceedings

Springer, 2021.

This book constitutes revised selected papers from the 9th International Conference on Analysis of Images, Social Networks and Texts, AIST 2020, held during October 15-16, 2020. The conference was planned to take place in Moscow, Russia, but changed to an online format due to the COVID-19 pandemic. The 27 full papers and 4 short papers presented ...

Added: October 7, 2020

Breaking Sticks and Ambiguities with Adaptive Skip-gram

Bartunov S., Кондрашкин Д. А., Osokin A. et al., / Series arXiv:1502.07257 "Computation and language". 2015.

Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to ...

Added: November 5, 2015

A General Method Applicable to the Search for Anglicisms in Russian Social Network Texts

Fenogenova A., Karpov I., Kazorin V., , in: Proceedings of the Artificial Intelligence and Natural Language AINL FRUCT 2016 Conference, Saint-Petersburg, Russia, 10-12 November 2016.: FRUCT Oy, 2016. P. 31–36.

With the process of globalization the number of borrowings from English has rapidly increased in languages all over the world. In systems of automatic speech recognition, spell-checking, tagging and other tasks in the field of natural language processing the loan words frequently cause problems and should be treat separately. In this paper we present a ...

Added: October 19, 2016

Использование BERT для классификации коротких научных текстов на русском языке

Кусакин И. К., Цурупа А. М., Алмакаев А. В. et al., В кн.: НТИ-2022. Научная информация в современном мире: глобальные вызовы и национальные приоритеты : материалы 10-ой научной конференции с международным участием, посвященной 70-летию ВИНИТИ РАН, Москва, 25–26 октября 2022 года.: М.: ВИНИТИ РАН, 2022. С. 103–109.

This work is devoted to the study of approaches for training BERT-based classifiers of scientific articles to implement the application with the adoption of the best models for use in the infrastructure of the VINITI RAS. For this purpose, the BERT linguistic model was trained on a specialized corpus of scientific texts for subsequent use ...

Added: January 31, 2023

Analysis of Images, Social Networks and Texts. 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7-9, 2016, Revised Selected Papers. Communications in Computer and Information Science

Switzerland: Springer, 2017.

This book constitutes the proceedings of the 5th International Conference on Analysis of Images, Social Networks and Texts, AIST 2016, held in Yekaterinburg, Russia, in April 2016. The 23 full papers, 7 short papers, and 3 industrial papers were carefully reviewed and selected from 142 submissions. The papers are organized in topical sections on machine ...

Added: October 19, 2016

The Presence of Order-Effect Bias in Moscow Administration

Dmitry Romanov, Kazantsev N., Edgeeva E., , in: Business Process Management: Blockchain and Central and Eastern Europe Forum. BPM 2019Vol. 361.: Springer, 2019. P. 337–341.

This paper studies ‘the order effect’ in decision making based on classification results of 120 000 citizen claims to Moscow Government. We use machine learning methods and derive that with 60% probability the first out of two consequent claims is prioritized. We conclude that this impact must be considered whilst developing artificial intelligence units. ...

Added: October 26, 2020

8th Russian Summer School in Information Retrieval (RuSSIR 2014)

Braslavski P., Karpov Nikolay, Worring M. et al., ACM SIGIR Forum 2014 Vol. 48 No. 2 P. 105–110

The 8th Russian Summer School in Information Retrieval (RuSSIR 2014) was held on August 18-22, 2014 in Nizhniy Novgorod, Russia.1 The school was co-organized by the National Research University Higher School of Economics2 and the Russian Information Retrieval Evaluation Seminar (ROMIP) ...

Added: August 22, 2015

Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected Papers

Switzerland: Springer, 2015.

This book constitutes the proceedings of the Fourth International Conference on Analysis of Images, Social Networks and Texts, AIST 2015, held in Yekaterinburg, Russia, in April 2015. The 24 full and 8 short papers were carefully reviewed and selected from 140 submissions. The papers are organized in topical sections on analysis of images and videos; ...

Added: October 12, 2015

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov A., Lomotin Konstantin, Kozlova Ekaterina, Data Science Journal 2019 Vol. 18 No. 1 P. 1–17

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine (with taking into account such a ...

Added: August 25, 2019