• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group

P. 174–186.
Afanasev I.

The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of such an attitude.We take an alternative approach and study East Slavic lects (such as Khislavichi) as separate systems. The proposed method includes the development of a tagged corpus through morphological tagging with the models trained on the bigger lects. Morphological tagging results may be used to place these lects among the bigger ones, such as standard Belarusian or standard Russian. The implemented morphological taggers of standard Russian and standard Belarusian demonstrate an accuracy higher than the accuracy of multilingual models by 3 to 15{%. The study suggests possible ways to adapt these taggers to the Khislavichi dataset, such as tagset unification and transcription closer to the actual sound rather than the standard lect pronunciation. Automatic classification supports the hypothesis that Khislavichi is a border East Slavic lect that historically was Belarusian but got russified: the algorithm places it either slightly closer to Russian or to Belarusian.

Language: English
Full text
Text on another site
Keywords: диалектологияавтоматическая обработка естественного языкаautomatic classificationавтоматическая классификацияdialectologymorphological taggingморфологическая разметкаdigital dialectologyцифровая диалектологияNatural Language Processing (NLP)KhislavichiХиславичи

In book

Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
Association for Computational Linguistics, 2023.
Similar publications
A textual fingerprint learning model to detect fake information spreaders in social networks
Behzadidoost R., Neurocomputing 2025 Vol. 665 P. 1–21
While earlier research has focused on detecting misinformation content, identifying the users who spread it, referred to in this paper as fake information spreaders, remains a relatively new challenge. These users deliberately mix true and false information, making detection more difficult. This paper proposes a textual fingerprint learning model to detect fake information spreaders. The ...
Added: March 12, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Грамматический ландшафт художественной прозы: динамика частеречных распределений в русском рассказе XX века
Kirina M., В кн.: Русская грамматика: полипарадигмальность как методологический принцип современных научных исследований : материалы IX Международного научного симпозиума.: Издательство ИГУ, 2025. С. 270–275.
В статье представлены результаты пилотного исследования, направленного на описание дистрибуции частей речи в синхронии и диахронии на материале русской прозы малой формы. Рассматриваются изменения морфологического состава художественных текстов (на уровне грамматических классов) на протяжении XX века в соответствии с 9 историко-культурными периодами. Материалом исследования выступает выборка из 943 рассказов суммарным объемом более 3 млн. словоупотреблений. ...
Added: February 28, 2026
Образцы говора македонских переселенцев в Южном Банате Республики Сербии, сёла Качарево и Глогонь, община Панчево
Muravleva N., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. С. 426–441.
В статье публикуются нарративы на македонском языке, записанные во время экспедиции 2023 года (Борисов, Кикило, Немчинов 2024) у ин формантов — представителей македонского меньшинства, проживаю щих в сёлах Качарево и Глогонь (серб. Kačarevo, Glogonj) общины Пан чево, Воеводина, Республика Сербия. В диалектных текстах отражены контактные явления, возникшие под влиянием мажоритарного сербского языка, а также смешение ...
Added: February 18, 2026
Претериальные формы в идиоме македонских переселенцев Воеводины (Сербия)
Muravleva N., Славянский мир в третьем тысячелетии 2025 Т. 20 № 3-4 С. 144–172
The article examines the features of the past tense system in Macedonian resettlement dialects of the Autonomous Province of Vojvodina, Serbia, based on a corpus of texts collected during a 2023 linguistic expedition to the villages of Jabuka, Kačarevo, Glogonj, Plandište, and Belgrade. The first section provides a sociolinguistic overview of the formation of the ...
Added: February 18, 2026
Development of a Language Model for Automated Classification of English-Language Scientific Articles by SRSTI Codes
V. V. Zunin, A. I. Afonin, V. I. Anoshin et al., Automatic Documentation and Mathematical Linguistics 2025 Vol. 59 No. 5 P. 287–293
The development of an artificial intelligence-based language model for classifying English-language scientific articles by SRSTI codes is described. This improves the processes of reviewing and indexing scientific publications. A pre-processed dataset of scientific articles was used for training and testing the models. An architecture for cascade classification was developed, and the performance of models with ...
Added: February 11, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
SynEL: A synthetic benchmark for entity linking
Karpov I., Kirillovich A., Goncharova E. et al., Plos One 2026 Vol. 21 No. 1 Article e0339468
Large language models (LLMs) offer significant potential for constructing commonsense knowledge graphs from text, demonstrating adaptability across diverse domains. However, their effectiveness varies significantly with domain-specific language, highlighting a critical need for specialized benchmarks to assess and optimize knowledge graph construction sub-tasks like named entity recognition, relation extraction, and entity linking. Currently, domain-specific benchmarks are ...
Added: January 15, 2026
Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence
Washington, United States of America: AAAI Press, 2025.
AAAI-25 Technical Tracks 23 (Natural Language Processing II) collects peer-reviewed research papers that advance the state of natural language processing, with an emphasis on large language models, efficient inference, instruction following, retrieval augmentation, and multimodal language understanding. The papers address both theoretical and practical challenges, including model efficiency, interactive generation, grounding in external knowledge and ...
Added: December 18, 2025
Диалектные различия между востоком и западом на материале данных Диалектологического атласа русского языка: результаты многомерного шкалирования
Марченко И. А., Ronko R., В кн.: Исследования по славянской диалектологии. Выпуск 25Т. 25.: М.: Институт славяноведения РАН, 2025. Гл. 5 С. 236–260.
This paper presents a classification of Russian dialects based on data from the Dialectological Atlas of the Russian Language, using the method of multidimensional scaling. The main outcome of the study is a map of the Russian dialectal space, which identifies six zones (three western and three eastern) and corresponding sets of dialectal features. The ...
Added: December 7, 2025
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs
Kudelya A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 1528–1533.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Empaths at SemEval-2025 Task 11: Retrieval-Augmented Approach to Perceived Emotions Prediction
Morozov L., Mogilevskii A., Shirnin A., , in: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025).: Association for Computational Linguistics, 2025. P. 2000–2007.
This paper describes LIBU (LoRA enhanced influence-based unlearning), an algorithm to solve the task of unlearning - removing specific knowledge from a large language model without retraining from scratch and compromising its overall utility (SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models). The algorithm combines classical influence functions to remove the influence of ...
Added: November 17, 2025
Findings of the Association for Computational Linguistics: EMNLP 2025
Association for Computational Linguistics, 2025.
The book contains this year’s edition of the Conference on Empirical Methods in Natural Language Processing! Importantly, it marks the 30th edition of EMNLP. With over 8,000 submissions, more than 3,000 accepted papers, and thousands of attendees, we have come a long way from that first workshop, which had 14 accepted papers. As the field ...
Added: November 16, 2025
Диалектометрический подход к диалектной классификации восточнославянских языков на материале сборника «Восточнославянские изоглоссы»
Manusov A. V., Кузьмина А. С., Вопросы языкового родства 2024 № 22/3-4 С. 342–366
The article proposes a new dialectometric approach to the division of East Slavic languages. Our dialectometry is based on the material from the collection of articles “Vostochnoslavyanskie izoglossy” (“East Slavic isoglosses”, 1995–2006), which is a generalization of data from atlases of East Slavic languages (Dialectological atlas of the Russian language, Dialectological atlas of the Belarusian ...
Added: November 13, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit