• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Preprints
  • A hybrid lemmatiser for Old Church Slavonic
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

A hybrid lemmatiser for Old Church Slavonic

NRU HSE , 2021.
Afanasev I.
The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a short description of previous attempts and current trends in lemmatisation. The lemmatiser is hybrid-based and uses the advantages of linguistic rules for specific cases (fragmentary tokens, punctuation, or digits), a dictionary for the most common tokens, and a sequence-to-sequence (seq2seq) neural network with an attention mechanism for the rest of material. The model achieves an 85% overall accuracy score, which is lower than one of the previous models for the Universal Dependencies(UD) dataset. However, when specific tokens are taken into consideration, the model outperforms the previous ones with the help of its rule-based part. Possible further directions of the research include the use of more sophisticated architectures, such as BART.
Research target: Philology and Linguistics
Priority areas: humanitarian
Language: English
Full text
Keywords: автоматическая обработка естественного языкалемматизациягибридный подходстарославянский языкhybrid approachOld Church SlavonicNeural Language Processing (NLP)lemmatizationseq2seqseq2seq
Similar publications
Practicamos el Subjuntivo
Bocharov Y., M.: -, 2025.
This textbook is designed for students improving their Spanish proficiency at levels B1-B2. It consists of five topics and a selection of texts to reinforce them. The first topic covers the morphology of the four tenses (present, perfect, imperfect, subjunctive perfect) and exercises on the formation of forms. The remaining topics are devoted to exploring ...
Added: May 23, 2026
Эстетика аудиовизуальной журналистики. Учебное пособие. 2-е издание
Novikova A., Бережная М. А., Кирия И. В., КноРус, 2026.
The aesthetics of journalism is substantiated as a necessary component in the professional training of specialists in audiovisual media. The factors and trends of historical and current changes in the aesthetics of journalism are presented, and the aesthetic practices of audiovisual journalism are characterized in terms of their social functioning. Criteria for aesthetic evaluation are ...
Added: May 22, 2026
Juxtapositional vs. possessive-like encoding in Russian specificational constructions
Logvinova N., Russian linguistics 2026 Vol. 50 Article 11
This paper presents the first in-depth corpus-based study of a previously overlooked syntactic variation in Russian: the competition between juxtapositional (Nominative) and possessive-like (Genitive) encoding of the second noun (the term) in specificational constructions (e.g., ponjatie čest’ (notion.NOM honor.NOM) vs. ponjatie česti (notion.NOMhonor.GEN) ‘the notion of honor’). While typological research has established cross-linguistic preferences for one encoding strategy over another, intralinguistic variation ...
Added: May 18, 2026
FOCUS ON VOCABULARY Экономика материальных и нематериальных активов: корпусный словарь и ИИ-упражнения по английскому языку
Gorina O. G., Kucherenko S., Larisa K. et al., St. Petersburg: Asterion, 2026.
This textbook is an integrated teaching and learning resource for English for Specific Purposes (ESP) in the field of economics of tangible and intangible assets. Its design employs (i) modern corpus linguistics methods, including frequency analysis and keyword extraction based on authentic texts reflecting current trends in professional discourse, and (ii) artificial intelligence technologies for ...
Added: May 16, 2026
КОГНИТИВНО-АССОЦИАТИВНОЕ ПОЛЕ ОНИМОВ САНКТ-ПЕТЕРБУРГА И ВЕНЫ
Зелинская Ю. Ю., Когнитивные исследования языка 2025 № 4(65) С. 180–186
The article focuses on the study of the onym as a cognitive stimulus that facilitates the decoding of the language of urban space across two ethnic groups. The research is grounded in the analysis of results from an onomastic associative experiment, aimed at identifying the dominant types of associative responses to anthroponyms, oikodonyms, hodonyms, and ...
Added: May 16, 2026
Лично-числовая асимметрия: согласование пассивных миративов в казымском диалекте хантыйского языка
Starchenko A., Toldova S., Типология морфосинтаксических параметров 2023 Т. 6 № 1 С. 130–148
The study focuses on a previously unrecorded model of split agreement in the mirative paradigm in Kazym Khanty. Split agreement is found when comparing active and passive mirative constructions, as well as in a limited set of uses of non-finite forms. In the passive voice, unlike the active voice, the 3rd person is unmarked and the ...
Added: May 14, 2026
Глаголы перемещения веществ в славянских языках
Fedorov D., Jezikoslovni Zapiski 2026 Т. 32 № 1 С. 23–52
This article describes verbs denoting motion of liquid and dry substances in Slavic langu­ages. The research explores how Slavic languages lexicalize different situations within the semantic field of substance motion and identifies the parameters that drive this lexicalization (e.g., type of substance, intensity and quantization of flow, and causation). Adjacent gram­matical phenomena such as argument ...
Added: May 13, 2026
Образ женщины сквозь года: диахронический анализ репрезентации женщин в российской агитационной рекламе
Gabrielova E., Максименко О. И., Социальные и гуманитарные науки на Дальнем Востоке 2026 Т. 23 № 1 С. 241–249
The article presents a diachronic analysis of the representation of women in Russian advertising, based on agitation posters from 1917-1990 and social and motivational advertising materials from 2000-2020. The aim of the study is to identify the evolution of verbal and visual strategies for constructing the image of women in the changing socio-political and cultural ...
Added: May 13, 2026
Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing
Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.
The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...
Added: May 12, 2026
«Плоский мир» Т. Пратчетта глазами русскоязычного фандома
Кульков А. Н., Tsvetkova M. V., Вестник Томского государственного университета. Филология 2026 № 100 С. 158–173
Впервые делается попытка рассмотреть особенности фанфикшн как акта продуктивной рецепции, возникшего на основе цикла романов Терри Пратчетта о Плоском мире в России. Проведенный анализ показывает, что прежде всего авторы фанфиков стремятся передать стилистику и комическое начало оригинального цикла Пратчетта, вне зависимости от жанра и формата создаваемых ими произведений. Фикрайтеры наиболее часто обращаются к таким форматам, ...
Added: May 10, 2026
Вселенная Достоевского
Pershkina A., М.: Альпина нон-фикшн, 2026.
Филолог Анастасия Першкина рассказывает о том, как писатель создавал свой мир, кем его населил, какие законы установил и почему этот мир так ярко действует на нас. Кроме того, вы узнаете, кто помогал Федору Михайловичу работать, как писатель связывал между собой произведения, что думали о его текстах современники и что же такое достоевщина. ...
Added: May 6, 2026
The hypothesis of dependence of the lexical nature of mixed languages on the patterns of their emergence
Gridneva E., Vestnik Tomskogo Gosudarstvennogo Universiteta, Filologiya 2026 No. 100 P. 38–52
This study investigates mixed languages, with a specific focus on their lexical characteristics. It proposes and substantiates the hypothesis that the degree of lexical mixing in such languages — reflected in the prevalence of doublets and the distribution of vocabulary between source languages — is linked to the specific pattern of their emergence, rather than ...
Added: May 6, 2026
Школьный литературный канон эмиграции 1918–1939 гг.
Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.
В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...
Added: April 22, 2026
Современная российская мультипликация как инструмент воспитания традиционных духовно-нравственных ценностей
Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.
The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...
Added: April 19, 2026
Дискриминативная лемматизация сокращений в эпоху LLM
Глазкова А. В., Смаль И. В., Lyashevskaya O. et al., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 527 С. 146–155
This paper presents a study on the effectiveness of discriminative methods for abbreviation lemmatization in Russian texts. Unlike generative approaches, discriminative models select the optimal lemma from a fixed set of candidates, eliminating the risk of generating grammatically incorrect word forms. For the first time in Russian language processing, we conduct a comprehensive analysis of ...
Added: March 10, 2026
Rubic2: Ensemble Model for Russian Lemmatization
Afanasev I., Glazkova A., Lyashevskaya O. et al., , in: Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025).: Association for Computational Linguistics, 2025. P. 157–170.
Pre-trained language models have significantly advanced natural language processing (NLP), particularly in analyzing languages with complex morphological structures. This study addresses lemmatization for the Russian language, the errors in which can critically affect the performance of information retrieval, question answering, and other tasks. We present the results of experiments on generative lemmatization using pre-trained language ...
Added: March 10, 2026
Transformer-based approaches for lemmatizing abbreviations in Russian texts
Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47
This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...
Added: March 10, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Building a Clean Bartangi Language Corpus and Training Word Embeddings for Low-Resource Language Modeling
Shumen: INCOMA Ltd, 2025.
This paper introduces a rule-based lemmatization and word embedding pipeline for the endangered Bartangi language, part of the Pamiri language group. The system combines a manually constructed lemma dictionary with morphological suffix rules to improve linguistic consistency in low-resource settings. The results demonstrate enhanced lemmatization accuracy and higher-quality embeddings for downstream NLP tasks. The work ...
Added: October 20, 2025
Политическая аккомодация культурных различий в индустриально развитых обществах (Political Accommodation of Cultural Differences in Industrialized Societies)
Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.
The  notion  of  “political  accommodation” applied  to the  theory  and  practice  of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...
Added: September 26, 2025
Национальная мощь современных государств: сравнительный анализ. Аналитический доклад
Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.
Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...
Added: September 19, 2025
Автоматическая саммаризация родительских чатов в WhatsApp
Dmitrieva K., Жолус М. Р., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2025 Т. 23 № 1 С. 80–92
Automatic text summarization is one of the main tasks of natural language processing (NLP), which consists in creating a shorter version of the source text. In today’s world the amount of information consumed by people is constantly increasing, therefore more and more emphasis is being placed on the task of summarization. There are two main approaches ...
Added: July 8, 2025
Методы и средства извлечения терминов из текстов для терминологических задач
Bolshakova E. I., Семак В. В., Программные продукты и системы 2025 Т. 38 № 1 С. 5–16
The current state in the field of automatic term extraction from specialized natural language texts, including scientific and technical documents, is considered. Practical applications of methods and tools for extracting terms from texts include creation of terminological dictionaries, thesauri, and glossaries of problem oriented domains, as well as extraction of keywords and construction of subject ...
Added: July 2, 2025
Automation of Forensic Authorship Attribution: Problems and Prospects
Romanova T. V., Khomenko A., Legal Issues in the Digital Age 2022 Vol. 3 No. 2 P. 90–115
The article deals with validation of an integrative attribution algorithm based on the analysis of the author’s idiostyle using methods of interpretative linguistics with ob jectification of the available data with the help of mathematical statistics. The algo rithm addresses the identification problem of the attribution. The choice of parameters describing the individual style of ...
Added: March 12, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit