Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

A. Romanov; Lomotin Konstantin; Kozlova Ekaterina

doi:10.5334/dsj-2019-037

Publications

?

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Data Science Journal. 2019. Vol. 18. No. 1. P. 1–17.

Romanov A., Lomotin Konstantin, Kozlova Ekaterina

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine (with taking into account such a feature of scientific texts as a large number of terms specific for various categories) was carried out. Separately, the stages of data collection and extraction of text characteristics are considered. The results of research are used in development of a decision support system for assignment of scientific texts to the code of the department or abstract journal of All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences.

Research target: Computer Science

Priority areas: IT and mathematics engineering science

Keywords: support vector machines method logistic regression machine learning text classification artificial neural network random forest

Метод преобразования речевого сигнала для улучшения разборчивости речи

Savchenko L., Савченко В. В., Радиотехника и электроника 2025 Т. 70 № 8 С. 753–760

The problem of improving speech intelligibility in voice communication systems is considered. The acute issue of speaker recognition when applying known methods for solving this problem is highlighted. To overcome the specified problem, a new method for transforming the speech signal based on an autoregressive model of the vocal tract and the principle of frequency-selective ...

Added: January 29, 2026

Specification Tests for Jump-Diffusion Models Based on the Characteristic Function

Belomestny D., Grobler G. L., Meintanis S. G. et al., International Statistical Review 2026 P. 1–31

Goodness-of-fit tests are suggested for several popular jump-diffusion processes. The suggested test statistics utilise the marginal characteristic function of the model and its L2-type discrepancy from an empirical counterpart. Model parameters are estimated either by minimising the aforementioned L2-type discrepancy or by maximum likelihood. A hybrid estimation method that uses moment estimation is also proposed ...

Added: January 29, 2026

An Analysis of Sequential Patterns in Datasets for Evaluation of Sequential Recommendations

Klenitskiy A., Володкевич А. А., Pembek A. et al., ACM Transactions on Recommender Systems 2026

Sequential recommender systems are an important and in-demand area of research. These systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure ...

Added: January 28, 2026

Autoregressive generation strategies for Top-K sequential recommendations

Klenitskiy A., Гусак Д. И., Володкевич А. А. et al., User Modelling and User-Adapted Interaction 2025 No. 35 Article 13

The goal of modern sequential recommender systems is often formulated in terms of next-item prediction. In this paper, we explore the applicability of transformer-based generative models for the Top-K sequential recommendation task, where the goal is to predict items that a user is likely to interact with in the “near future.” This goal aligns with ...

Added: January 26, 2026

Marchenko–Pastur Law for Spectra of Random Weighted Bipartite Graphs

Nadutkina A., Tikhomirov A., Timushev D., Siberian Advances in Mathematics 2025 Vol. 34 P. 146–153

We study the spectra of random weighted bipartite graphs. We establish that, under specific assumptions on the edge probabilities, the symmetrized empirical spectral distribution function of the graph’s adjacency matrix converges to the symmetrized Marchenko-Pastur distribution function. ...

Added: January 26, 2026

Conceptual Knowledge Structures First International Joint Conference, CONCEPTS 2024, Cádiz, Spain, September 9–13, 2024, Proceedings

Obiedkov S., Switzerland: Springer, 2024.

This book constitutes the proceedings of the First International Joint Conference on Conceptual Knowledge Structures, CONCEPTS 2024, which took place in Cádiz, Spain, during September 9-13, 2024. The conference is an amalgamation of the 18th International Conference on Formal Concept Analysis (ICFCA); the 17th International Conference on Concept Lattices and Their Applications (CLA); and the 28th ...

Added: January 23, 2026

Cooperative games with fuzzy characteristic functions on concept lattices

Kemgne M. W., Njionou B. B., Ignatov D. I. et al., International Journal of Approximate Reasoning 2025 Vol. 186 P. 1–18

This paper introduces cooperative games with transferable utilities and fuzzy characteristic functions on concept lattices. While previous works have independently addressed games with fuzzy payoffs and games restricted to structured coalition systems such as lattices, our approach combines both perspectives. We consider cooperative settings where coalition formation is constrained by a concept lattice structure, and ...

Added: January 23, 2026

Run time dynamic digital twins and dynamic digital twins networks

Vodyaho A., Delhibabu R., Ignatov D. I. et al., Future Generation Computer Systems 2025 Vol. 172 P. 1–18

Digital twins are widely used for building various types of cyber–physical systems. There are a huge number of publications devoted to the use of digital twins in production systems. Much less attention is paid to the issues of building runtime digital twins. The article describes an approach to building complex distributed cyber–physical systems with a ...

Added: January 23, 2026

LAMBO: Landmarks Augmentation With Manifold-Barycentric Oversampling

Bespalov Y., Buzun N., Kachan O. et al., IEEE Access 2022 No. 10 Article 3219934

We propose the first data augmentation method based on optimal transport theory, with the generated data being guaranteed to belong to the original data manifold. The proposed algorithm randomly samples a clique in the nearest-neighbors graph representing the data knowledge and computes the Wasserstein barycenter between the neighbours with random uniform weights. Being extremely natural- ...

Added: January 21, 2026

Blurred Magnitude Homology of Functional Connectome for ASD Diagnosis

Alexander Kachura, Vsevolod Chernyshev, Kachan O. et al., Frontiers in Psychiatry 2026 Vol. 16 Article 1677282

Autism spectrum disorder (ASD) is one of the most common neurodevelopmental disorders. Existing studies show that adults with ASD may experience accelerated or altered neurocognitive aging. Consequently, cognitive decline in people with ASD can be delayed if timely measures are taken to treat this disorder. This study focuses on the development of a new algorithm ...

Added: January 21, 2026

19th Annual Conference, TAMC 2025, Jinan, China, September 19–21, 2025, Proceedings. Theory and Applications of Models of Computation. Lecture Notes in Computer Science (LNCS, volume 16084)

Springer, 2026.

This book constitutes the proceedings of the 19th Annual Conference on Theory and Applications of Models of Computation, TAMC 2025, which was held in Jinan, China, during September 19–21, 2025. ...

Added: January 20, 2026

11th Russian Supercomputing Days, RuSCDays 2025, Moscow, Russia, September 29–30, 2025, Revised Selected Papers

Springer, 2026.

Added: January 20, 2026

Computer-aided system for assessing and selecting effective masters' learning trajectory in variability of external factors considering the university industrial partners' opinion

A. V. Vishnekov, E. M. Ivanova, N. Zhursunova, Информатика и образование 2025 Vol. 40 No. 6 P. 39–48

The modern education system development is characterized by many uncertain, dynamically changing factors. The purpose and originality of the study presented in the article is to develop an automated system for building an effective educational trajectory in the conditions of external factors’ uncertainty. The developed system is a tool for assessment and dynamic adjustment of ...

Added: January 16, 2026

Многоаспектная оценка методов адаптации токенизатора для больших языковых моделей на русском языке

Andriushchenko G. D., Godunova M., Ivanov V. et al., Doklady Mathematics 2025 Vol. 527 P. 320–331

Large language models (LLMs) pretrained on English-centered corpora have biases and perform sub-optimally on other natural languages. Adaptation of LLMs vocabulary provides a resource-efficient way to improve the quality of a pretrained model. Previously proposed adaptation techniques focus on performance (accuracy) and size metrics (fertility), ignoring other aspects in comparison, such as inference latency, compute ...

Added: January 15, 2026

Iterative Ricci-Foster Curvature Flow with GMM-Based Edge Pruning: A Novel Approach to Community Detection

Sorokin K., Beketov M., Онучин А. et al., / arxiv.org. Серия cs.SI "Social and Information Networks ". 2025.

Community detection in complex networks is a fundamental problem, open to new approaches in various scientific settings. We introduce a novel community detection method, based on Ricci flow on graphs. Our technique iteratively updates edge weights (their metric lengths) according to their (combinatorial) Foster version of Ricci curvature computed from effective resistance distance between the ...

Added: January 15, 2026

SynEL: A synthetic benchmark for entity linking

Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 1 No. 1 P. 1–18

Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...

Added: January 15, 2026

On syntactic concept lattice models for the Lambek calculus and infinitary action logic

Stepan L. Kuznetsov, Journal of Logic and Computation 2026 Vol. 36 No. 1 Article exaf078

The linguistic applications of the Lambek calculus suggest its semantics over algebras of formal languages. A straightforward approach to construct such semantics indeed yields a brilliant completeness theorem (Pentus 1995, Ann. Pure Appl. Logic, 75, 179–213). However, extending the calculus with extra operations ruins completeness. In order to mitigate this issue, Wurm (2017, J. Logic Lang. Inf., ...

Added: January 14, 2026

Method of a voice source acoustic analysis in real time

Savchenko V., Savchenko L., Measurement Techniques 2025 Vol. 68 P. 453–463

The problem of non-invasive analysis of the vocal function of the speech apparatus based on the speaker’s speech signal is addressed. A new method of acoustic analysis of a pulse-type voice source based on a two-stage measurement procedure has been developed. The first stage of measurements provides for filtering of the voice excitation signal of the vocal tract ...

Added: January 13, 2026

KDD '25: Proceedings of the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Volume 2

Association for Computing Machinery (ACM), 2025.

Added: January 12, 2026

Aggression in Digital Interactions: The Effect of Toxicity in Online Gaming Communication

Iuliia Naidenova, Parshakov P., Matkin N., Journal of Content, Community and Communication 2025 Vol. 23 Article 5

This study analyzes the computer mediated text communication of non-professional video game players. The purpose of this study is to identify the impact of player communication toxicity on team performance. The dataset comprises 42,720 matches played between November 5 and November 18, 2015, including game statistics and chat messages. We use a BERT-model to classify ...

Added: December 29, 2025

Implementing Transport Coding in OMNeT++ for Message Delay Reduction

Petrovanov I., Sergeev A., / Series Computer Science "arxiv.org". 2025. No. 2512.18332.

Transport coding reduces message delay in packet-switched networks by introducing controlled redundancy at the transport layer: original packets are encoded into coded packets, and the message is reconstructed after the first successful deliveries, effectively shifting latency from the maximum packet delay to the -th order statistic. We present a concise, reproducible discrete-event implementation of transport coding in OMNeT++, including ...

Added: December 24, 2025

Method of Automated Dataset Collection for Microwave Filters Synthesis

Arinin O. V., Bakhmach D. M., Katsnelson A. et al., , in: 2025 Systems of Signals Generating and Processing in the Field of on Board Communications.: IEEE, 2025. P. 1–5.

This research discusses the method of dataset collection automatization for microwave filter synthesis by integrating machine learning techniques, thus reducing development time. Utilizing the 3D electromagnetic analysis software package, the study involves simulation and collecting geometric parameters and amplitude-frequency characteristics from three variants of passband highly selective microstrip tworesonator combined filters with stepped impedance resonators. ...

Added: December 6, 2025

Hessian-based lightweight neural network for brain vessel segmentation on a minimal training dataset

Меньшиков И. А., Бернадотт А. К., Elvimov N. S., / Series arXie "Statistical mechanics". 2025.

Accurate segmentation of blood vessels in brain magnetic resonance angiography (MRA) is essential for successful surgical procedures, such as aneurysm repair or bypass surgery. Currently, annotation is primarily performed through manual segmentation or classical methods, such as the Frangi filter, which often lack sufficient accuracy. Neural networks have emerged as powerful tools for medical image ...

Added: December 1, 2025

Determining the boundary of dynamical chaos in the generalized Chirikov map via machine learning

Чернышов Д. П., Satanin A., Shchur L., / Series arXiv "math". 2025.

We investigate the boundary separating regular and chaotic dynamics in the generalized Chirikov map, an extension of the standard map with phase-shifted secondary kicks. Lyapunov maps were computed across the parameter space (K,K(α, τ)) and used to train a convolutional neural network (ResNet18) for binary classification of dynamical regimes. The model reproduces the known critical ...

Added: November 21, 2025