• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Books
  • Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling

Shumen : INCOMA Ltd, 2025.

This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work contributes to the preservation and computational modeling of underrepresented languages.

Research target: Computer Science Engineering and Technology
Language: English
Text on another site
Keywords: lemmatization word embeddingsNatural Language Processing (NLP)
Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling
Similar publications
Coping with AI errors with provable guarantees
Tyukin I., Tyukina T., van Helden D. P. et al., Information Sciences 2024 Vol. 678 Article 120856
AI errors pose a significant challenge, hindering real-world applications. This work introduces a novel approach to cope with AI errors using weakly supervised error correctors that guarantee a specific level of error reduction. Our correctors have low computational cost and can be used to decide whether to abstain from making an unsafe classification. We provide ...
Added: May 23, 2026
Overcoming the Curse of Dimensionality with Synolitic AI
Zaikin A., Sviridov I., Sosedka A. et al., Technologies 2026 Vol. 14 No. 2 Article 84
High-dimensional tabular data are common in biomedical and clinical research, yet conventional machine learning methods often struggle in such settings due to data scarcity, feature redundancy, and limited generalization. In this study, we systematically evaluate Synolitic Graph Neural Networks (SGNNs), a framework that transforms high-dimensional samples into sample-specific graphs by training ensembles of low-dimensional pairwise ...
Added: May 23, 2026
Stable On-the-Fly Learning for Dynamic Neural Networks With Delayed Inputs
Kibkalo Vladislav, Chertopolokhov V., Mukhamedov A. et al., IEEE Access 2026 Vol. 14 P. 14369–14392
This study presents on-the-fly identification and multi-step prediction of nonlinear systems with delayed inputs using a dynamic neural network combined with a smooth projection onto ellipsoids. The projection enforces parameter constraints that guarantee stability, while a Lyapunov–Krasovskii analysis yields computable ultimate error bounds. Riccati-type matrix inequalities are derived, providing an efficient vectorization–projection–devectorization implementation suitable for ...
Added: May 22, 2026
Опыт применения сетевого анализа (SNA) в историческом нарративе полисубъектного региона (на примере валлийской хроники Brut y Tywysogyon)
Loshkareva M. E., Matveeva N., Вестник Томского государственного университета. История 2026 № 100 С. 112–118
This research is an endeavor to apply social network analysis (SNA) to the study of a medieval narrative source. The authors suppose that the use of network analysis may offer new possibilities in the study of the history of regions characterized by some political fragmentation. Authors tried to construct networks of historical interactions from 1193 ...
Added: May 22, 2026
АНАЛИЗ НАПРАВЛЕНИЙ ИННОВАЦИОННОГО РАЗВИТИЯ СРЕДСТВ ИЗМЕРЕНИЙ В РОССИЙСКОЙ ФЕДЕРАЦИИ
Plotnikov S., Альманах современной метрологии 2024 № 2 (38) С. 140–149
The results of the analysis of data collected by the method of expert questioning of Russian scientific metrological institutes’ employees, as well as buyers of measuring instruments, are presented. ...
Added: May 21, 2026
АНАЛИЗ СОДЕРЖАНИЯ ФЕДЕРАЛЬНОГО ИНФОРМАЦИОННОГО ФОНДА ПО ОБЕСПЕЧЕНИЮ ЕДИНСТВА ИЗМЕРЕНИЙ В ЧАСТИ ОСЦИЛЛОГРАФОВ
Plotnikov S., Апрелев А. В., Апрелева М. А. et al., Альманах современной метрологии 2022 № 2 (30) С. 94–101
The results of the analysis of data from the Federal Information Fund for Ensuring the Uniformity of Measurements in terms of oscilloscopes are presented. ...
Added: May 21, 2026
ML-based Fast Simulation of FARICH Responses
Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.
A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...
Added: May 19, 2026
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Rabat: Association for Computational Linguistics, 2026.
Added: May 19, 2026
Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures
Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727
Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...
Added: May 19, 2026
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...
Added: May 19, 2026
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
Kondratev S., Yulia Dyrchenkova, Georgiy Nikitin et al., Technologies 2026 Vol. 14 No. 1 Article 69
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...
Added: May 19, 2026
Parallel Computational Technologies. PCT 2025
Springer, 2025.
This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...
Added: May 18, 2026
KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security
Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15
To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...
Added: May 16, 2026
DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data
Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66
Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...
Added: May 16, 2026
QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks
Xiong N., Long W., He D. et al., Algorithms 2026 Vol. 19 No. 5 Article 386
In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)
Springer, 2026.
Added: May 12, 2026
Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами
Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110
Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...
Added: May 12, 2026
Connected and Automated Vehicle Scenario Manager Graphical User Interface
Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547
High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...
Added: May 11, 2026
A textual fingerprint learning model to detect fake information spreaders in social networks
Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21
While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...
Added: March 12, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Rubic2: Ensemble Model for Russian Lemmatization
Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.
Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
SynEL: A synthetic benchmark for entity linking
Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 21 No. 1 Article e0339468
Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...
Added: January 15, 2026
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit