An HMM-based PoS tagger for Old Church Slavonic

O. Lyashevskaya; I. Afanasev

doi:10.2478/jazcas-2021-0051

Publications

?

An HMM-based PoS tagger for Old Church Slavonic

Jazykovedny Casopis. 2021. Vol. 72. No. 2. P. 556–567.

Lyashevskaya O., Afanasev I.

We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of- domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.

Research target: Philology and Linguistics

Keywords: гибридные модели hybrid models скрытые Марковские модели старославянский язык Old Church Slavonic морфологическая разметка universal dependencies универсальные зависимости POS tagging частеречная разметка HMM tagger

Китайский язык: второй иностранный язык: 7-й класс: сборник грамматических упражнений: учебное пособие

Sizova A., Просвещение, 2025.

This grammar practice workbook for Grade 7 learners from the "It's Time to Learn Chinese!" series is designed to strengthen students’ grammatical accuracy across a range of language skills and communicative activities. It offers an extensive collection of practice exercises, complemented by engaging tasks of varying formats and levels of difficulty. In addition, the book ...

Added: February 4, 2026

Natural Language Processing and Information Systems : 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4-6, 2025 : proceedings. Part I

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

No ‘iota’ type-shifter in Kazym Khanty

Tiutiunnikova V., Mikhailov Stiopa, Golosov F., Proceedings of Sinn und Bedeutung 2025 No. 29 P. 1593–1608

In this paper, we present new challenging data from Kazym Khanty (a Uralic language spoken in Western Siberia, Russia): in this articleless language, bare singular and bare dual NPs in argument positions can receive indefinite readings on par with definite ones, contradicting the predictions of the classic neo-Carlsonian approach (Chierchia, 1998; Dayal, 2004). We argue ...

Added: January 30, 2026

Употребление порядковых числительных в разных семантических контекстах (на материале параллельных переводов Нового Завета)

Nasledskova P., Известия РАН. Серия литературы и языка 2025 Т. 84 № 6 С. 88–102

Работа посвящена сравнению употребления порядковых конструкций в разных семантических контекстах в пяти языках: русском, английском, испанском, индонезийском и рутульском. Сравнение проведено на материале параллельных переводов Нового завета. Из шести книг Нового Завета (канонические Евангелия, Деяния апостолов и Откровение Иоанна Богослова) были выбраны стихи, в которых хотя бы в одном из языков выборки употреблены порядковые числительные. ...

Added: January 29, 2026

Применение технологий ИИ в обучении студентов в рамках дисциплины «Академическое письмо на английском языке»

Gabrielova E., Магия ИННО 2025 Т. 7 № 1 С. 165–172

Artificial intelligence (AI) technologies are rapidly developing and are being widely applied in various fields, including education. The use of AI carries certain risks; however, one cannot completely reject it in student education. The article presents the experience of using AI in teaching English to 34 fourth-year students and 26 post-graduate students within the discipline ...

Added: January 29, 2026

Explorations in Applied Ethnolinguistics: Words, Cultures, and Global Perspectives

Palgrave Macmillan, 2025.

This volume contributes to the growing body of cutting-edge research into the Natural Semantic Metalanguage (NSM) approach in linguistics. It explores the broad range of possible applications enabled by the NSM approach, from linguistic studies of semantics and culture to cross-cultural studies, psychology and childhood education. The volume builds on previous studies, bringing a diversity ...

Added: January 28, 2026

Эпос о Гильгамеше. Перевод Николая Гумилева. Предисловие Е. Маркиной. Введение В. Шилейко.

Markina E., Манн, Иванов и Фербер, 2025.

Аннотация издателя: «Эпос о Гильгамеше» — древнейший памятник мировой литературы, дошедший до нас из глубин шумерской и аккадской цивилизаций. Поэма повествует о приключениях могущественного царя города Урука и его друга Энкиду. Это история о силе и дружбе, гордыне и смирении, страхе смерти и жажде бессмертия. Поэма издается в переводе поэта-акмеиста Николая Гумилева с пояснительной статьей ассириолога и современника поэта Владимира Шилейко, ...

Added: January 28, 2026

The representation of the climate crisis in Croatian online news media

Šarić L., Trnavac R., Frontiers in Communication 2026

This study analyzes agency and news values across verbal and visual modalities in Croatian online news on the climate crisis, examining how climate change is portrayed. We explore newsworthiness, visual framing, and metaphor, linking agency to broader concerns about responsibility. In addition, the analysis traces how different types of agency shape news values in both metaphorical and nonmetaphorical micro-contexts. To ...

Added: January 27, 2026

Semi-fake indexicals in Russian

Тискин Д. Б., Типология морфосинтаксических параметров 2025 Vol. 8 No. 1 P. 112–129

There are several rival theories of fake indexicals, i.e. bound indexicals (prominently pronouns) whose φ-features do not semantically contribute to focus alternatives (e.g. Only Mary did her homework, John didn’t do his). According to Minimal Pronoun theories (such as Kratzer’s or Wurmbrand’s), bound pronouns are Merged without φ-features and acquire them under binding via agreement-like ...

Added: January 26, 2026

Некоторые модификации к теории связанных употреблений индексальных выражений И. Басси

Тискин Д. Б., Типология морфосинтаксических параметров 2024 Т. 7 № 1 С. 107–123

Fake indexicals (FIs), or bound-variable uses of e.g. 1st - and 2 nd -person pronouns, have been analysed by Bassi (2021) as arising from a post-syntactic process of inspecting the features of the referent. This leads to a peculiar analysis of the syntax and semantics of relative clauses containing FIs. I argue for a more ...

Added: January 26, 2026

Искусство (не)простого юридического письма. Учебное пособие

Knutov A., Chaplinskiy A., Мищенко П. А. et al., М.: Проспект, 2026.

Учебное пособие содержит рекомендации к стилю юридического письма, следование которым поможет сделать его более понятным для читателей. Первая глава систематизирует накопившиеся знания об общих стилевых особенностях языка права и его месте в речевой системе русского языка. Последующие главы посвящены отдельным видам юридических документов: языку законов, языку процессуальных документов, языку договоров и языку юридических аналитических документов. ...

Added: January 26, 2026

Из переписки Е. А. Миллиор с Я. М. Боровским (1946–1960)

Ermakova L., Вестник Удмуртского университета. Серия История и филология 2025 Т. 35 № 6 С. 1403–1422

The article publishes and analyzes the correspondence between the historian of antiquity Elena A. Millior (1900–1978) and the classical philologist Yakov M. Borovsky (1896–1994), covering the years 1946–1960 and preserved in the archives of the Institute of Russian Literature (Pushkin House) of the Russian Academy of Sciences and the Bibliotheca Classica Petropolitana in St. Petersburg. ...

Added: January 26, 2026

Творчество Д.Н. Мамина-Сибиряка и современный мир

М., Екатеринбург: Кабинетный ученый, 2024.

В монографии рассматривается творчество классика уральской и общерусской литературы XIX в. Д. Н. Мамина-Сибиряка. Исследуются и описываются различные аспекты его художественного мира: аксиологическая и этическая проблематика, имеющие как универсальный, так и национальный характер, вопросы гео- и этнопоэтики, особенности нарративной организации текстов и художественного языка писателя, родословие Мамина и прикладные моменты его творчества, включая представление наследия писателя современной аудитории. Издание снабжено указателем произведений Мамина-Сибиряка. Книга предназначена для ...

Added: January 26, 2026

«Философия права» Гегеля и дело Коцебу: культурно-политический контекст

Lagutina I., Философические письма. Русско-европейский диалог 2025 Т. 8 № 4 С. 165–201

This article examines the assassination of the playwright August von Kotzebue by the theology student K. L. Sand as an event reflecting the ideological and philosophical tensions of early nineteenth-century Germany. It analyzes G. W. F. Hegel’s response to this historical episode in the context of his “Philosophy of Right”, which criticizes ethical and religious ...

Added: January 25, 2026

Языковые модели для предобработки текстов в машинном переводе

Mylnikova A., Mylnikov L., Научно-техническая информация. Серия 2: Информационные процессы и системы 2025 № 7 С. 32–44

Рассмотрена модель использования скелетных структур на базе синтаксической разметки для предобработки корпусов текстов перед передачей в нейросетевые модели машинного перевода с целью повышения качества их работы, реализованная с помощью частеречной и синтаксической разметок корпусов текстов, использующих языковую модель, с использованием сети BERT и набора правил. Описана подготовка данных для обучения и предложены способы повышения эффективности ...

Added: September 22, 2025

Early warning system for Russian stock market crises: TCN-LSTM-Attention model using imbalanced data and attention mechanism

Teplova T., Fayzulin M., Kurkin A., Socio-Economic Planning Sciences 2025 No. 101 Article 102292

This research is devoted to the development and evaluation of the effectiveness of machine learning and deep learning models for forecasting crisis phenomena in the Russian stock market. The work covers the period from the beginning of 2014 to June 2024, using the IMOEX index as the main indicator of the market condition. Special attention ...

Added: August 2, 2025

Afanasev I., Lyashevskaya O., , in: Structuring Lexical Data and Digitising Dictionaries: Grammatical Theory, Language Processing and Databases in Historical Linguistics.: Boston, Leiden: Brill, 2024. P. 13–35.

Added: January 7, 2025

Перфект в старославянском: был ли он результативным?

Plungian V., Урманчиева А. Ю., Slovĕne 2017 Т. 6 № 2 С. 13–56

Перфект, как известно, является одной из самых загадочных форм старославянского языка, семантика которой упорно не поддается описанию. Старославянские тексты представляют собой переводы (прежде всего — с греческого), и в них в значительной степени наблюдается калькирование как в сфере лексики, так и в сфере грамматических форм и конструкций. Но именно перфект нарушает эту картину: соответствия перфектных ...

Added: November 11, 2023

К типологии нерезультативного перфекта (на материале старославянского языка)

Plungian V., Урманчиева А. Ю., Slavisticna Revija 2018 Т. 66 № 4 С. 421–440

В работе предпринята попытка приблизиться к пониманию семантики старославянского перфекта — аналитической глагольной формы, состоящей из l-причастия смыслового глагола и вспомогательного глагола byti в презенсе. Выделены типичные контексты употребления данной формы; основной задачей исследования является оценка получившегося «семантического портрета» старославянского перфекта с точки зрения типологических ожиданий, сформировавшихся в отношении перфектных форм в языках мира. Показано, ...

Added: November 11, 2023

Disambiguation in context in the Russian National Corpus: 20 yeas later

Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22.: [б.и.], 2023. P. 307–318.

An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...

Added: September 15, 2023

The Old Church Slavonic Corpora and Their Use in Language Studies at the University

Afanasev I., Babanov A., , in: Literature, Language and Computing: Russian Contribution.: Springer, 2023.

Added: September 15, 2023

The Use of Khislavichi Lect Morphological Tagging to Determine its Position in the East Slavic Group

Afanasev I., , in: Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023).: Association for Computational Linguistics, 2023. P. 174–186.

The study of low-resourced East Slavic lects is becoming increasingly relevant as they face the prospect of extinction under the pressure of standard Russian while being treated by academia as an inferior part of this lect. The Khislavichi lect, spoken in a settlement on the border of Russia and Belarus, is a perfect example of ...

Added: May 15, 2023

Proceedings of Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)

Association for Computational Linguistics, 2023.

These proceedings include the 23 papers presented at the 10th Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), co-located with the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Both EACL and VarDial were held in Dubrovnik, Croatia, in a hybrid format, allowing participants to attend on-site or ...

Added: May 15, 2023

Building a Universal Dependencies Treebank for a Polysynthetic Language: the Case of Abaza

Koshevoy A., Panova A., Makarchuk I., , in: Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023).: Washington: Association for Computational Linguistics, 2023. P. 1–6.

In this paper, we discuss the challenges that we faced during the construction of a Universal Dependencies treebank for Abaza, a polysynthetic Northwest Caucasian language. We propose an alternative to the morpheme-level annotation of polysynthetic languages introduced in Park et al. (2021). Our approach aims at reducing the number of morphological features, yet providing all ...

Added: March 20, 2023