• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Preprints
  • Applying statistical tagging to Russian poetry
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Applying statistical tagging to Russian poetry

NRU HSE , 2018. No. 76.
Starchenko A., Kazakevich L., Lyashevskaya O.
The poetic texts pose a challenge to full morphological tagging and lemmatization since the authors seek to extend the vocabulary, employ morphologically and semantically deficient forms, go beyond standard syntactic templates, use non-projective constructions and non-standard word order, among other techniques of the creative language game. In this paper we evaluate a number of probabilistic taggers based on decision trees, CRF and neural network algorithms as well as one state-of-the-art dictionary-based tagger. The taggers were trained on prosaic texts and tested on three poetic samples of different complexity. Firstly, we discuss the method to compile the gold standard datasets for the Russian poetry. Secondly, we evaluate the taggers’ performance in the identification of the part of speech tags and lemmas. Finally, we analyze different types of errors in the taggers’ output. We analyse the confusion matrix of the parts of speech and mismatches in lemma annotation.
Research target: Philology and Linguistics Computer Science
Priority areas: humanitarian IT and mathematics
Language: English
Full text
Keywords: natural language processingRussian languageRussian poetryNLP evaluationfull morphology tagging
Publication based on the results of:
Материалы к частотному словарю русской поэзии (2018)
Similar publications
Innovations in Information and Decision Sciences. Proceedings of the 13th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2025), Volume 4
Springer, 2026.
The book presents the proceedings of the 13th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2024), held at Intelligent Systems Research Group (ISRG), London Metropolitan University, London, United Kingdom, during June 6–7, 2025. Researchers, scientists, engineers and practitioners exchange new ideas and experiences in the domain of intelligent computing theories with ...
Added: June 8, 2026
От «контейнеров для знаний» к «эпистемической амальгамации»: концептуальная метафора в дискурсе междисциплинарности
Nagornaya A., Bakulev A., Человек: образ и сущность. Гуманитарные аспекты 2026 № 2(66) С. 9–36
The paper looks into the role of conceptual metaphor in understanding the principles of interdisciplinary cooperation and promoting interdisciplinarity as a mode of scientific cognition. It identifies the ideational, explanatory, illustrative, prescriptive and affective functions of metaphor in the interdisciplinarity discourse. On the basis of papers and monographs on the methodology of science, published between ...
Added: June 5, 2026
Avant-Garde Poetry and the Tékhnē of Traditional Versification
Kazartsev (Evgenii Kazartcev) E., Kirichenko N., Arts 2026 Vol. 15 No. 5 Article 97
This article offers a theoretically nuanced and empirically grounded investigation into the paradoxical afterlife of classical versification within the poetic practices of the Russian and Soviet avant-garde. Challenging the persistent historiographic narrative that equates avant-garde poetics with an unequivocal rupture from tradition, the study demonstrates that canonical metrical forms—most notably iambic tetrameter—continued to operate as ...
Added: June 4, 2026
Rank‑Turbulence Delta and interpretable approaches to stylometric Delta measures
Dmitry Pronin, Evgeny Kazartsev, Digital Scholarship in the Humanities 2026 P. 1–15
This article repositions Burrows’s Delta as a flexible family of distance measures for exploratory and unsupervised stylometry, where interpretability and stability are as important as predictive accuracy. We introduce two probabilistic extensions, Rank-Turbulence Delta and Jensen–Shannon Delta, by reinterpreting uncentred standardized word-frequency vectors as non-negative representations that can be normalized into probability distributions and compared ...
Added: June 4, 2026
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
Seul: PMLR, 2026.
Added: June 4, 2026
Трансфикциональный метаперсонаж Жан-Батист Ботюль
Kirichenko V., Известия Саратовского университета. Новая серия. Серия: Филология. Журналистика 2026 Т. 26 № 2 С. 200–209
This paper focuses on the fi gure of the fi ctional philosopher Jean-Baptiste Botul and his role in the discourse of contemporary French literature. To analyze this character, the work employs the concepts of transfi ctionality and meta-character. Botul was invented by the French satirical journalist Frédéric Pagès. His creation gained a widespread popularity among ...
Added: June 3, 2026
OpenAtom Foundation. Консорциум, развивающий Open Source в Китае.
Silakov D., Системный администратор 2026 № 3 С. 28–33
В статье про платформы для разработки открытого ПО в Китае мы рассказали про GitCode – молодой проект, позиционируемый как площадка для разработчиков со всего мира. Сейчас на GitCode размещаются проекты, созданные в КНР, но некоторые из них уже известны и на международной арене. Помочь открытым проектам в становлении, развитии и расширению аудитории призван фонд OpenAtom ...
Added: June 2, 2026
Система синтаксических инвариантов текстовой деятельности: статистические дескрипторы, семантическая структура и диагностические профили
Kudriavtseva E., / РЦИС. Серия № 0148-756-286. 2026.
The content of the work is the system is a system for identifying four types of written speech structures. A set of 11 calculated parameters, statistical standards, and semantic characteristics allows for the identification of a text's structure as the result of a specific cognitive schema (scene, event, story, evaluation). The method has been verified ...
Added: June 2, 2026
Жанровое своеобразие стихотворений во «Властелине колец» Дж.Р.Р. Толкина (на примере плача по Боромиру)
Афанасьев В. А., Новый филологический вестник 2026 № 1(76) С. 274–283
“The Lord of the Rings” by J.R.R. Tolkien features a plethora of verse inser-tions in the form of poems recited or chanted (as songs) by the characters. These poetic texts are characterised by remarkable genre diversity, coinciding Tolkien’s aesthetic and literary preferences as well as his intention to imbue his Secondary World with literary works ...
Added: June 2, 2026
Между дилетантизмом и диссидентством: переводы рассказов Бориса Виана в «Митином журнале»
Balakireva M., Новое литературное обозрение 2026 № 2 (198) С. 225–237
The article focuses on the study of unofficial translations from French, specifically the translation of Boris Vian’s short stories, published in «Mitin Journal». By examining the features of these translations, we can better understand the role of language in samizdat and rethink the position of the unofficial translator, who is opposed to the official translator ...
Added: June 1, 2026
Почему растущие доходы не делают людей счастливее: эмоциональное объяснение парадокса Истерлина (Why Growing Incomes Do Not Make People Happier: an Emotional Explanation of the Easterlin Paradox)
Vorchik A., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2026.
This work is devoted to a theoretical explanation of the Easterlin paradox, according to which long-term economic growth does not make average level of people's happiness increasing. By happiness, we mean the intensity of emotions people experience while comparing their new income with its expected value, or the target income with its original value. In the first case, ...
Added: May 31, 2026
ML-based Fast Simulation of FARICH Responses
Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.
A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...
Added: May 19, 2026
Juxtapositional vs. possessive-like encoding in Russian specificational constructions
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
Natural hazard database from Internet publications: text mining with a large language model
Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.
Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...
Added: April 28, 2026
Школьный литературный канон эмиграции 1918–1939 гг.
Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.
В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...
Added: April 22, 2026
Algorithmic overlaps as thermodynamic variables: from local to cluster Monte Carlo dynamics in critical phenomena
Pilé I., Deng Y., Shchur L., / Series arXiv "math". 2026. No. 2604.10254.
We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Svendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for two models in different universality classes: the Ising model and the Potts model with three components. The overlap of two ...
Added: April 20, 2026
Современная российская мультипликация как инструмент воспитания традиционных духовно-нравственных ценностей
Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.
The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...
Added: April 19, 2026
Using predefined vector systems to speed up neural network multimillion class classification
Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.
Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...
Added: April 2, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Rubic2: Ensemble Model for Russian Lemmatization
Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.
Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
Неслучайные и случайные четырехстопные ямбы Н.А. Некрасова. Сравнительный анализ ритмики стиха и прозы поэта
Kazartsev (Evgenii Kazartcev) E., Качалов В. В., Вестник Казахского национального педагогического университета имени Абая. Серия «Филологические науки» 2023 Т. 83 № 1 С. 29–38
The article is devoted to the study of the rhythm of verse and prose by N.A. Nekrasov using quantitative methods. In this work, the poet's poems written in iambic tetrameter are considered, their correspondence to the trends in the verse of the 1840s-1880s is analyzed. Prose analysis is carried out by constructing and comparing a ...
Added: February 27, 2026
RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки
Затулин К. Ф., Егоров В. Г., Докучаева А. В. et al., М.: Институт диаспоры и интеграции (Институт стран СНГ), 2025.
Книга «Правовое положение соотечественников, проживающих в постсоветских странах, в условиях нестабильной международной обстановки» содержит результаты исследования, проведенного в Абхазии, Азербайджане, Армении, Беларуси, Грузии, Казахстане, Киргизии, Латвии, Литве, Молдове, Приднестровской Молдавской Республике, Таджикистане, Узбекистане, Эстонии и Южной Осетии. Исследование выполнено Институтом диаспоры и интеграции (Институтом стран СНГ) в 2024 году. Оно включило в себя анализ нормативно-правовых ...
Added: February 3, 2026
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit