• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • BERT-like Models for Slavic Morpheme Segmentation
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

BERT-like Models for Slavic Morpheme Segmentation

P. 6795–6815.
Morozov D., Astapenka L., Glazkova A., Garipov T., Lyashevskaya O.

Automatic morpheme segmentation algorithms are applicable in various tasks, such as building tokenizers and language education. For Slavic languages, the development of such algorithms is complicated by the rich derivational capabilities of these languages. Previous research has shown that, on average, these algorithms have already reached expert-level quality. However, a key unresolved issue is the significant decline in performance when segmenting words containing roots not present in the training data. This problem can be partially addressed by using pre-trained language models to better account for word semantics. In this work, we explored the possibility of fine-tuning BERT-like models for morpheme segmentation using data from Belarusian, Czech, and Russian. We found that for Czech and Russian, our models outperform all previously proposed approaches, achieving word-level accuracy of 92.5-95.1%. For Belarusian, this task was addressed for the first time.

Language: English
Full text
DOI
Keywords: морфологический анализсловообразованиеword formationславянские языкиSlavic languagesmorphological analysismorpheme segmentation for Russianautomatic morpheme segmentationword segmentationmorpheme segmentation for Belarusianmorpheme segmentation for Czechморфемная сегментацияморфемная сегментация русского языкаморфемная сегментация белорусского языкаморфемная сегментация чешского языка

In book

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Vol. 1: Long papers. , Association for Computational Linguistics, 2025.
Similar publications
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 Т. 32 № 1 С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Association for Computational Linguistics, 2025.
Added: March 10, 2026
Особенности функционирования существительных с опустошенной семантикой в русской разговорной речи
Nikishina E., Труды института русского языка им. В.В. Виноградова 2025 № 3(45) С. 231–244
The article examines placeholder nouns (штука, фигня, хреновина, etc.) in Russian colloquial speech. These words can function similarly to pronouns by substituting for other words, but their range of functions is much broader than that of pronouns. The study analyzes two groups of vague reference words: initially neutral ones (штука, вещь, дело) and initially evaluative ...
Added: March 8, 2026
Анализ словообразовательных механизмов формирования сленговых выражений молодежной среды в китайском языке
Pavlova O., Раджабова Л. К., Филологические науки. Вопросы теории и практики 2022 Т. 15 № 2 С. 559–563
The paper aims to identify the specificity of the word-formation mechanisms of creating slang expressions by young people in the Chinese language. The article analyzes the definitions of the notion “slang” in the Chinese linguistics. The scientific originality of the research lies in the fact that it considers and analyzes new slang expressions which came ...
Added: February 23, 2026
ПЕРЕВОД ТЕРМИНОВ АЭРОКОСМИЧЕСКОЙ ОТРАСЛИ С ПОМОЩЬЮ КОМПОНЕНТНОГО АНАЛИЗА (НА МАТЕРИАЛЕ КИТАЙСКОГО ЯЗЫКА)
Pavlova O., Известия Волгоградского государственного педагогического университета 2022 № 4(167) С. 197–202
The article deals with the issue of the translation of the Chinese terms of the aerospace industry into Russian with the help of the componential analysis. There are given the models of the word formation of the terminological combinations. The author reveals that the productive ways of the formation of the terms of the aerospace ...
Added: February 23, 2026
Youth slang as a social language code: funktion and formation
Trofimova N., Pesina S., Vinogradova S. et al., Brazilian Journal of Education, Technology and Society - BRAJETS 2025
The article presents numerous functions and special features of slang, which is able to represent a variety of communicative intentions of the speakers. In addition to the desire of communicants to maintain the confidentiality of communication through slang, the latter is used as a means of language economy, as a mechanism for creating new concepts, ...
Added: February 22, 2026
Полисемия агентивных суффиксов в славянских языках: когнитивно-семантический анализ
Андреева А. А., Jezikoslovni Zapiski 2025 Т. 31 № 1 С. 133–163
В статье анализируется полисемия суффиксов существительных, обозначающих деятеля, в шести славянских языках (русском, украинском, польском, чешском, сербском и словенском) с использованием «метонимического» подхода к словообразованию, разработанного Л. Яндой (2011). Рассматриваются семантические особенности глаголов, к которым могут присоединяться суффиксы, обозначающие деятеля, и описываются семантические типы, представленные производными существительными. В работе показано, что суффиксы имени деятеля служат ...
Added: February 16, 2026
Автоматическое выявление побуждений в тексте: применение методов компьютерной лингвистики в работе эксперта-лингвиста
П.Е. Белова, А.К. Сафарян, В кн.: Научно-практическая конференция с международным участием "Национальные и международные тенденции и перспективы развития судебной экспертизы". Сборник докладов.: Н. Новгород: Изд-во ННГУ им. Н.И. Лобачевского, 2024.
В данной статье представлено описание системы автоматического поиска и извлечения побуждений из текстов на русском языке FindImper, основанной на поиске глагольных форм и синтаксических связей. Алгоритм реализован на языке программирования Python с использованием библиотек для морфологического и синтаксического анализа и набора правил. Данный инструмент направлен на оптимизацию работы эксперта-лингвиста и доступен к использованию через веб-сайт ...
Added: January 30, 2026
Apposition (Appositional Constructions)
Natalia N. Logvinova, , in: Encyclopedia of Slavic Languages and Linguistics Online.: Brill, 2025. Ch. 11.
Two types of appositional phrases are distinguished in Slavic languages: close and loose. With close constructions, the issues of syntactic headedness and optional case concord between the parts are discussed. Loose appositions are functionally different from close appositions, having a role comparable to secondary predication. ...
Added: December 22, 2025
Nominative Object
Ronko R., Wiemer B., , in: Encyclopedia of Slavic Languages and Linguistics Online.: Brill, 2020.
The nominative object describes a clause type in which the object of a transitive verb takes nominative morphology, and this coding is not conditioned by voice operations. It is a salient property in regions in which Slavic varieties have been in contact with Finnic- and/or Baltic-speaking population, i.e., in the eastern part of the Circum-Baltic ...
Added: December 19, 2025
Диалектометрический подход к диалектной классификации восточнославянских языков на материале сборника «Восточнославянские изоглоссы»
Manusov A. V., Кузьмина А. С., Вопросы языкового родства 2024 № 22/3-4 С. 342–366
The article proposes a new dialectometric approach to the division of East Slavic languages. Our dialectometry is based on the material from the collection of articles “Vostochnoslavyanskie izoglossy” (“East Slavic isoglosses”, 1995–2006), which is a generalization of data from atlases of East Slavic languages (Dialectological atlas of the Russian language, Dialectological atlas of the Belarusian ...
Added: November 13, 2025
Палеославистика 6. Славянское и балканское языкознание. Выпуск 25
Савич В., Паскаль А. Д., Вершинин К. В. et al., Полимедиа, 2025.
The volume of the “Slavic and Balkan Linguistics” series presents the monograph “Palaeoslavistica – 6” written by the international team of researchers. The sections of the co-authored monograph are devoted to the latest results of the ongoing research of the Slavic manuscripts written in the 10th–14th centuries, their language, textology, and palaeography. ...
Added: November 12, 2025
Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?
Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84
Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...
Added: January 7, 2025
Проприетивные и привативные аффиксы в некоторых уральских языках: о маркированности и (не)словоизменительном статусе
Лапшина К. М., Вопросы языкознания 2025 № 1 С. 95–118
This paper studies the morphosyntactic properties of bound proprietive and privative markers in some Uralic languages using corpus data and grammatical descriptions. Such affixes are predominantly attached to substantive bases and form derivatives with the meaning of ‘possessing X’ and ‘deprived of X’, respectively. The first part of the study is devoted to the comparison of ...
Added: December 19, 2024
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit