Bridging Gaps in Russian Language Processing: AI and Everyday Conversations

?

Bridging Gaps in Russian Language Processing: AI and Everyday Conversations

P. 253–258.

Tatiana Sherstinova, Nikolay Mikhaylovskiy, Evgenia Kolpashchikova, Violetta Kruglikova

Contemporary advancements in NLP and neural network techniques are paving the way to enhance and harness traditional linguistic resources and corpora, as well as expand the methods of applying neural networks for complex language material. Thus, a weak point for both theoretical and applied linguistic tasks is the processing of spontaneous everyday speech. Two experiments described in this article are dedicated to the analysis of how successfully modern neural models cope with the recognition and generation of everyday Russian speech. The material for the experiments is the well-known ORD speech corpus, the largest collection of professional and mundane dialogues in Russian. The first experiment targets the pressing issue of increasing the volume of transcribed speech data through state-of-the-art automatic speech recognition techniques. Experimental recognition was conducted using two diverse methods – the NTR Acoustic Model and OpenAI's Whisper system. The second experiment zeroes in on refining generative language models tailored for Russian using a conversational dataset. A prototype dialogue system, derived from the enhanced ruGPT-3 Small model, exemplifies the transformative potential of fine-tuning in dialogue generation tasks. The acquired results are utilized to enrich datasets for recognizing everyday Russian speech and for constructing chatbots that emulate spontaneous Russian conversations.

Publication based on the results of:

Text as Big Data: methods and models for big text data analysis (2024)

In book

Proceedings of the 35th Conference of Open Innovations Association FRUCT, 24-26 April 2024, Tampere, Finland

Issue 1. , FRUCT Oy, 2024.

Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)

Seul: PMLR, 2026.

Added: June 4, 2026

Granular computing-based deep learning for text classification

Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746

Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...

Added: March 12, 2026

30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

INCOMA Ltd, 2021.

Added: January 28, 2026

Screen-Cam Imitation Module for Improving Data Hiding Robustness

Dzhanashia K., Aleksandr Fedosov, Oleg Evsutin, Sensors 2025 Vol. 25 No. 23 Article 7726

Using an attack-simulation module is a well-recognized approach to improving the robustness of end-to-end neural-network-based data-hiding schemes. However, most proposed attack simulators are limited in the types of attacks they cover, usually handling only a basic set of digital transformations. Real, in-demand use cases for data-hiding methods may involve modifications that cannot be modeled by ...

Added: November 28, 2025

Understanding the training dynamics of CoLaNET by its simplified model

O.A. Goryunov, Maslennikov O. V., Kiselev M. V. et al., Chaos, Solitons and Fractals 2026 Vol. 203 Article 117663

Training complex, biologically plausible Spiking Neural Networks (SNNs) with local learning rules is a significant challenge for theoretical analysis. Here we address this problem by developing a comprehensive analytical theory for the learning dynamics of CoLaNET, a recently proposed columnar SNN. In particular, we consider a simplified model that captures the core algorithmic logic of ...

Added: November 28, 2025

Смежные права на результаты интеллектуальной деятельности, созданные искусственным интеллектом: философско-правовой анализ замены критерия творчества на критерий инвестиций

Pakshin P., Актуальные проблемы российского права 2025 Т. 20 № 11 С. 11–18

The paper substantiates the necessity of providing legal protection for the results of intellectual works created by artificial intelligence through the mechanism of related rights. It examines ways to reduce legal risks associated with the creation of intellectual property using artificial intelligence technologies and offers a philosophical and legal analysis of the proposed hypothesis, namely, ...

Added: November 27, 2025

Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Association for Computational Linguistics, 2025.

Added: November 17, 2025

2025 International Joint Conference on Neural Networks (IJCNN)

IEEE, 2025.

Added: November 15, 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...

Added: November 6, 2025

Segmentation of Vertebral Arteries on the MR Images

Prikhodko R., Moshkin A., Romanov A., , in: 2025 International Russian Automation Conference (RusAutoCon).: IEEE, 2025. P. 273–278.

The vertebral arteries are one of the most important sources of blood supply to the brain, therefore any pathological changes in them can be the reason behind serious diseases. Magnetic Resonance Imaging (MRI) allows diagnosticians to examine main arteries, which is exceptionally important for effective diagnosis. However, because of the small size of arteries relative ...

Added: November 6, 2025

Машинное обучение и представление информации: новые возможности цифровых архивов (рецензия на книгу: Artificial Intelligence, Archives and Manuscripts. New Relationships between the Virtual Archive and Its Referent. Edinburgh: University of Edinburgh, 2025

Penskaja E., Имагология и компаративистика 2025 № 23 С. 380–389

The book Artificial Intelligence, Archives and Manuscripts. New Relationships between the Virtual Archive and Its Referent (2025) is presented. This collective monograph discusses both technological and legal, intellectual issues that researchers and archivists face in automated work with manuscript heritage, artificial intelligence and neural networks. ...

Added: October 30, 2025

Free energy of neural network can predict accuracy after pruning

Surkov A., Sergei Koltcov, Ignatenko V. et al., Physica A: Statistical Mechanics and its Applications 2025 Vol. 681 Article 131085

Neural networks are powerful tools capable of achieving state-of-the-art performance across a wide range of tasks; however, their effectiveness often comes at the cost of extremely large numbers of parameters, which can hinder their deployment in resource-constrained environments. To address this issue, various pruning techniques have been proposed to reduce model size and complexity while ...

Added: October 30, 2025

Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения

Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181

Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...

Added: October 9, 2025

Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V

Cham: Springer, 2025.

This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025. The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...

Added: September 29, 2025

Цифровой театр абсурда: могут ли нейросети поставить новую научную проблему перед психологией? Кейс-сравнение ChatGPT и DeepSeek

Хашутогова У. П., Berezner T., Poddiakov A., Новые психологические исследования 2025 № 3 С. 100–125

The rapid advancement of artificial intelligence technologies has drawn increasing attention from psychological researchers. While neural networks are being integrated into nearly all domains of human activity, the boundaries of their applicability remain unclear — particularly regarding the originality and practical value of the content they generate. Proponents advocate for their widespread adoption, whereas skeptics ...

Added: September 4, 2025

Этические аспекты использования искусственного интеллекта в образовании при подготовке обучающимися письменных работ

Поставнева И. В., Dvoinin A., Информация и образование: границы коммуникаций 2025 Т. 25 № 17 С. 58–59

The article discusses ethically correct and incorrect ways of using generative artificial intelligence by students when preparing written works. In this context, the boundaries of ethically acceptable actions with neural networks are outlined. ...

Added: August 19, 2025

Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Tartu: University of Tartu Library, 2025.

The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...

Added: July 17, 2025