?
Applying statistical tagging to Russian poetry
NRU HSE
,
2018.
No. 76.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as one state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we discuss the method to compile the gold standard datasets for the Russian poetry. Secondly, we evaluate the taggers’ performance in the identification of the part of speech tags and lemmas. Finally, we analyze different types of errors in the taggers’ output. We analyse the confusion matrix of the parts of speech and mismatches in lemma annotation.
Language:
English
Keywords: natural language processingRussian languageRussian poetryNLP evaluationfull morphology tagging
Publication based on the results of:
Springer, 2026.
The book presents the proceedings of the 13th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2024), held at Intelligent Systems Research Group (ISRG), London Metropolitan University, London, United Kingdom, during June 6–7, 2025. Researchers, scientists, engineers and practitioners exchange new ideas and experiences in the domain of intelligent computing theories with ...
Added: June 8, 2026
Nagornaya A., Bakulev A., Человек: образ и сущность. Гуманитарные аспекты 2026 № 2(66) С. 9–36
The paper looks into the role of conceptual metaphor in understanding the principles of interdisciplinary cooperation and promoting interdisciplinarity as a mode of scientific cognition. It identifies the ideational, explanatory, illustrative, prescriptive and affective functions of metaphor in the interdisciplinarity discourse. On the basis of papers and monographs on the methodology of science, published between ...
Added: June 5, 2026
Kazartsev (Evgenii Kazartcev) E., Kirichenko N., Arts 2026 Vol. 15 No. 5 Article 97
This article offers a theoretically nuanced and empirically grounded investigation into the paradoxical afterlife of classical versification within the poetic practices of the Russian and Soviet avant-garde. Challenging the persistent historiographic narrative that equates avant-garde poetics with an unequivocal rupture from tradition, the study demonstrates that canonical metrical forms—most notably iambic tetrameter—continued to operate as ...
Added: June 4, 2026
Dmitry Pronin, Evgeny Kazartsev, Digital Scholarship in the Humanities 2026 P. 1–15
This article repositions Burrows’s Delta as a flexible family of distance measures for exploratory and unsupervised stylometry, where interpretability and stability are as important as predictive accuracy. We introduce two probabilistic extensions, Rank-Turbulence Delta and Jensen–Shannon Delta, by reinterpreting uncentred standardized word-frequency vectors as non-negative representations that can be normalized into probability distributions and compared ...
Added: June 4, 2026
Seul: PMLR, 2026.
Added: June 4, 2026
Kirichenko V., Известия Саратовского университета. Новая серия. Серия: Филология. Журналистика 2026 Т. 26 № 2 С. 200–209
This paper focuses on the fi gure of the fi ctional philosopher Jean-Baptiste Botul and his role in the discourse of contemporary French literature. To analyze this character, the work employs the concepts of transfi ctionality and meta-character. Botul was invented by the French satirical journalist Frédéric Pagès. His creation gained a widespread popularity among ...
Added: June 3, 2026
Silakov D., Системный администратор 2026 № 3 С. 28–33
В статье про платформы для разработки открытого ПО в Китае мы рассказали про GitCode – молодой проект, позиционируемый как площадка для разработчиков со всего мира. Сейчас на GitCode размещаются проекты, созданные в КНР, но некоторые из них уже известны и на международной арене. Помочь открытым проектам в становлении, развитии и расширению аудитории призван фонд OpenAtom ...
Added: June 2, 2026
Kudriavtseva E., / РЦИС. Серия № 0148-756-286. 2026.
The content of the work is the system is a system for identifying four types of written speech structures. A set of 11 calculated parameters, statistical standards, and semantic characteristics allows for the identification of a text's structure as the result of a specific cognitive schema (scene, event, story, evaluation). The method has been verified ...
Added: June 2, 2026
Афанасьев В. А., Новый филологический вестник 2026 № 1(76) С. 274–283
“The Lord of the Rings” by J.R.R. Tolkien features a plethora of verse inser-tions in the form of poems recited or chanted (as songs) by the characters. These poetic texts are characterised by remarkable genre diversity, coinciding Tolkien’s aesthetic and literary preferences as well as his intention to imbue his Secondary World with literary works ...
Added: June 2, 2026
Balakireva M., Новое литературное обозрение 2026 № 2 (198) С. 225–237
The article focuses on the study of unofficial translations from French, specifically the translation of Boris Vian’s short stories, published in «Mitin Journal». By examining the features of these translations, we can better understand the role of language in samizdat and rethink the position of the unofficial translator, who is opposed to the official translator ...
Added: June 1, 2026
Vorchik A., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2026.
This work is devoted to a theoretical explanation of the Easterlin paradox, according to which long-term economic growth does not make average level of people's happiness increasing. By happiness, we mean the intensity of emotions people experience while comparing their new income with its expected value, or the target income with its original value. In the first case, ...
Added: May 31, 2026
Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.
A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...
Added: May 19, 2026
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.
Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...
Added: April 28, 2026
Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.
В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...
Added: April 22, 2026
Pilé I., Deng Y., Shchur L., / Series arXiv "math". 2026. No. 2604.10254.
We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Svendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for two models in different universality classes: the Ising model and the Potts model with three components. The overlap of two ...
Added: April 20, 2026
Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.
The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...
Added: April 19, 2026
Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.
Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...
Added: April 2, 2026
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.
Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...
Added: March 10, 2026
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Kazartsev (Evgenii Kazartcev) E., Качалов В. В., Вестник Казахского национального педагогического университета имени Абая. Серия «Филологические науки» 2023 Т. 83 № 1 С. 29–38
The article is devoted to the study of the rhythm of verse and prose by N.A. Nekrasov using quantitative methods. In this work, the poet's poems written in iambic tetrameter are considered, their correspondence to the trends in the verse of the 1840s-1880s is analyzed. Prose analysis is carried out by constructing and comparing a ...
Added: February 27, 2026
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
Затулин К. Ф., Егоров В. Г., Докучаева А. В. et al., М.: Институт диаспоры и интеграции (Институт стран СНГ), 2025.
Книга «Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки» содержит результаты исследования, проведенного в Абхазии, Азербайджане, Армении, Беларуси, Грузии, Казахстане, Киргизии, Латвии, Литве, Молдове, Приднестровской Молдавской Республике, Таджикистане, Узбекистане, Эстонии и Южной Осетии. Исследование выполнено Институтом диаспоры и интеграции (Институтом стран СНГ) в 2024 году. Оно включило в себя анализ нормативно-правовых ...
Added: February 3, 2026