• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • An HMM-based PoS tagger for Old Church Slavonic
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 25, 2026
HSE Scientists Train Neural Network to 'Hear' Faults in Electric Motors
Researchers at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a new method—the Signature-Guided Data Augmentation (SGDA) framework—that achieves 99% accuracy in motor fault detection and 86% accuracy in fault classification. The application of this approach can reduce industrial equipment repair costs, minimise downtime, and improve production safety. The study results have been published in Engineering Applications of Artificial Intelligence.
May 25, 2026
'The Humanities Serve as a Conscience'
Maria Mizernaia studies Soviet literature and the history of book publishing. In this interview for the HSE Young Scientists project, she discusses plans to publish a novel about besieged Leningrad, AI-provoked reflections on what it means to be human, and how novels can help satisfy our dopamine hunger.
May 25, 2026
Is It Possible to Predict a Citys Life Based on the Shape of Its Neighbourhoods?
Is it possible to predict, based on the configuration of streets and buildings, where a café will open or where traffic congestion will occur? Participants in the Spatial Analysis and Modelling of Urban Processes research and study group use open data and machine learning to identify universal patterns. Alexander Sheludkov and Eduard Somov discuss the purpose of comparing cities, the need for new forms of urban statistics, and how open data is transforming approaches to urban studies.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

An HMM-based PoS tagger for Old Church Slavonic

Jazykovedny Casopis. 2021. Vol. 72. No. 2. P. 556–567.
Lyashevskaya O., Afanasev I.

We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of- domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.

Research target: Philology and Linguistics
Language: English
Full text
DOI
Text on another site
Keywords: гибридные моделиhybrid modelsскрытые Марковские моделистарославянский языкOld Church Slavonicморфологическая разметкаuniversal dependenciesуниверсальные зависимостиPOS taggingчастеречная разметкаHMM tagger
Similar publications
Лингвистический анализ рекламы парфюма в англоязычном и русскоязычном дискурсах
Gabrielova E., Шевякова Ю. С., Вестник Удмуртского университета 2026 Vol. 36 No. 2 P. 344–354
In today's globalized world, the effectiveness of sales and the success of products largely rely on well-crafted advertising texts. Influenced by this factor and the growing competition, advertising continuously evolves, incorporating various linguistic, psychological, and cross-cultural techniques. This study focuses on the linguistic and stylistic analysis of perfume advertising texts within English and Russian discourses, ...
Added: May 25, 2026
On the Curse Formula in Wʿzb’s Inscription (RIÉ 192 B, ll. 5–9)
Bulakh M., Aethiopica 2025 Vol. 28 P. 39–52
The article deals with the curse formula belonging to the sixth-century inscription by an Aksumite king Wʿzb (RIÉ 192 B, ll. 5–9). After summarizing the extant interpretations, the author proposes a new reading and interpretation, arguing that the text under scrutiny follows the same pattern and employs the same rhetoric devices as the curse formulas ...
Added: May 23, 2026
Practicamos el Subjuntivo
Bocharov Y., M.: -, 2025.
This textbook is designed for students improving their Spanish proficiency at levels B1-B2. It consists of five topics and a selection of texts to reinforce them. The first topic covers the morphology of the four tenses (present, perfect, imperfect, subjunctive perfect) and exercises on the formation of forms. The remaining topics are devoted to exploring ...
Added: May 23, 2026
Эстетика аудиовизуальной журналистики. Учебное пособие. 2-е издание
Novikova A., Бережная М. А., Кирия И. В., КноРус, 2026.
The aesthetics of journalism is substantiated as a necessary component in the professional training of specialists in audiovisual media. The factors and trends of historical and current changes in the aesthetics of journalism are presented, and the aesthetic practices of audiovisual journalism are characterized in terms of their social functioning. Criteria for aesthetic evaluation are ...
Added: May 22, 2026
Juxtapositional vs. possessive-like encoding in Russian specificational constructions
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
FOCUS ON VOCABULARY Экономика материальных и нематериальных активов: корпусный словарь и ИИ-упражнения по английскому языку
Gorina O. G., Kucherenko S., Larisa K. et al., St. Petersburg: Asterion, 2026.
This textbook is an integrated teaching and learning resource for English for Specific Purposes (ESP) in the field of economics of tangible and intangible assets. Its design employs (i) modern corpus linguistics methods, including frequency analysis and keyword extraction based on authentic texts reflecting current trends in professional discourse, and (ii) artificial intelligence technologies for ...
Added: May 16, 2026
КОГНИТИВНО-АССОЦИАТИВНОЕ ПОЛЕ ОНИМОВ САНКТ-ПЕТЕРБУРГА И ВЕНЫ
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 Т. 32 № 1 С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
«Плоский мир» Т. Пратчетта глазами русскоязычного фандома
Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173
Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...
Added: May 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Грамматический ландшафт художественной прозы: динамика частеречных распределений в русском рассказе XX века
Kirina M., В кн.: Русская грамматика: полипарадигмальность как методологический принцип современных научных исследований : материалы IX Международного научного симпозиума.: Издательство ИГУ, 2025. С. 270–275.
В статье представлены результаты пилотного исследования, направленного на описание дистрибуции частей речи в синхронии и диахронии на материале русской прозы малой формы. Рассматриваются изменения морфологического состава художественных текстов (на уровне грамматических классов) на протяжении XX века в соответствии с 9 историко-культурными периодами. Материалом исследования выступает выборка из 943 рассказов суммарным объемом более 3 млн. словоупотреблений. ...
Added: February 28, 2026
Языковые модели для предобработки текстов в машинном переводе
Mylnikova A., Mylnikov L., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 7 С. 32–44
Рассмотрена модель использования скелетных структур на базе синтаксической разметки для предобработки корпусов текстов перед передачей в нейросетевые модели машинного перевода с целью повышения качества их работы, реализованная с помощью частеречной и синтаксической разметок корпусов текстов, использующих языковую модель, с использованием сети BERT и набора правил. Описана подготовка данных для обучения и предложены способы повышения эффективности ...
Added: September 22, 2025
Early warning system for Russian stock market crises: TCN-LSTM-Attention model using imbalanced data and attention mechanism
Teplova T., Fayzulin M., Kurkin A., Socio-Economic Planning Sciences 2025 No. 101 Article 102292
This research is devoted to the development and evaluation of the effectiveness of machine learning and deep learning models for forecasting crisis phenomena in the Russian stock market. The work covers the period from the beginning of 2014 to June 2024, using the IMOEX index as the main indicator of the market condition. Special attention ...
Added: August 2, 2025
String Similarity Measures for Evaluating the Lemmatisation in Old Church Slavonic
Afanasev I., Lyashevskaya O., , in: Structuring Lexical Data and Digitising Dictionaries: Grammatical Theory, Language Processing and Databases in Historical Linguistics.: Boston, Leiden: Brill, 2024. P. 13–35.
Added: January 7, 2025
Перфект в старославянском: был ли он результативным?
Plungian V., Урманчиева А. Ю., Slovĕne 2017 Т. 6 № 2 С. 13–56
Перфект, как известно, является одной из самых загадочных форм старославянского языка, семантика которой упорно не поддается описанию. Старославянские тексты представляют собой переводы (прежде всего — с греческого), и в них в значительной степени наблюдается калькирование как в сфере лексики, так и в сфере грамматических форм и конструкций. Но именно перфект нарушает эту картину: соответствия перфектных ...
Added: November 11, 2023
К типологии нерезультативного перфекта (на материале старославянского языка)
Plungian V., Урманчиева А. Ю., Slavisticna Revija 2018 Т. 66 № 4 С. 421–440
В работе предпринята попытка приблизиться к пониманию семантики старославянского перфекта — аналитической глагольной формы, состоящей из l-причастия смыслового глагола и вспомогательного глагола byti в презенсе. Выделены типичные контексты употребления данной формы; основной задачей исследования является оценка получившегося «семантического портрета» старославянского перфекта с точки зрения типологических ожиданий, сформировавшихся в отношении перфектных форм в языках мира. Показано, ...
Added: November 11, 2023
Disambiguation in context in the Russian National Corpus: 20 yeas later
Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 307–318.
An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...
Added: September 15, 2023
The Old Church Slavonic Corpora and Their Use in Language Studies at the University
Afanasev I., Babanov A., , in: Literature, Language and Computing: Russian Contribution.: Springer, 2023.
Added: September 15, 2023
The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group
Afanasev I., , in: Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023).: Association for Computational Linguistics, 2023. P. 174–186.
The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of ...
Added: May 15, 2023
Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
Association for Computational Linguistics, 2023.
These proceedings include the 23 papers presented at the 10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Both EACL and VarDial were held in Dubrovnik, Croatia, in a hybrid format, allowing participants to attend on-site or ...
Added: May 15, 2023
Building a Universal Dependencies Treebank for a Polysynthetic Language: the Case of Abaza
Koshevoy A., Panova A., Makarchuk I., , in: Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023).: Washington: Association for Computational Linguistics, 2023. P. 1–6.
In this paper, we discuss the challenges that we faced during the construction of a Universal Dependencies treebank for Abaza, a polysynthetic Northwest Caucasian language. We propose an alternative to the morpheme-level annotation of polysynthetic languages introduced in Park et al. (2021). Our approach aims at reducing the number of morphological features, yet providing all ...
Added: March 20, 2023
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit