Text collections for evaluation of Russian morphological taggers

O. Lyashevskaya; Bocharov V.; Sorokin A.; T. Shavrina; D. Granovsky; Alexeeva S.

doi:10.1515/jazcas-2017-0035

Publications

?

Text collections for evaluation of Russian morphological taggers

Jazykovedny Casopis. 2017. Vol. 68. No. 2. P. 258–267.

Lyashevskaya O., Bocharov V., Sorokin A., Shavrina T., Granovsky D., Alexeeva S.

The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

Research target: Computer Science Philology and Linguistics

Priority areas: humanitarian

Keywords: natural language processing лексико-грамматическая разметка корпуса языка morphological tagging universal dependencies универсальные зависимости text collection shared task Russian corpora текстовые коллекции оценка автоматических методов обработки текста

The 12th International Conference on Information Technology and Quantitative Management (ITQM 2025)

Netherlands: ScienceDirect, 2025.

No ...

Added: June 28, 2026

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Выпуск 24.

M.: Max press, 2026.

The volume includes 64 papers from the international conference on computational linguistics and intelligent technologies 'Dialogue 2026,' representing a broad spectrum of theoretical and applied research in the field of natural language description, language process modeling, and the development of practically applicable computational linguistic technologies. For specialists in theoretical and applied linguistics and intelligent technologies. ...

Added: June 27, 2026

Object-centric process management: A research manifesto

Seidel A., Weske M., Montali M. et al., Information Systems 2026 Vol. 141 Article 102728

Business process management employs process models and event logs to represent the behavior of the information systems under study. Traditional case-centric notions consider the order of activities and events in isolated process instances. The emerging field of object-centric processes challenges this assumption by putting objects in the center. Object-centric process mining and modeling approaches identify ...

Added: June 27, 2026

2024 26th International Conference on Digital Signal Processing and its Applications (DSPA)

IEEE, 2024.

A.S. Popov Russian Science and Technical Society with support from V. A. Trapeznikov Institute of Control Sciences, V.A. Kotelnikov Institute of Radio Engineering and Electronics, Autex Ltd. is leading the ХХVIII International Conference «Digital Signal Processing and its Applications — DSPA-2024» ...

Added: June 27, 2026

Построение методик оценки качества восприятия (QOE) потокового видео

Ivchenko A., Дворкович А. В., Телекоммуникации 2020 Т. 12 С. 2–11

Dynamic Adaptive Streaming over HTTP (DASH) technology powers most multimedia services. Its specific features (re-buffering, quality switching, etc.) necessitate the development of specialized methods for assessing user subjective quality of experience (QoE) based on objective parameters. This article examines the impact of various metrics on QoE and presents assessment models with Spearman correlation coefficients up ...

Added: June 27, 2026

Платформа, управляемая событиями, для интеграции компонентов машинного зрения с операционным центром.

Gadzhimirzaev S., Хельвас А. В., 2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET) Mohammedia, Morocco 2023 P. 1–6

The article proposes the architecture for eventdriven Emergency Operation Center with Machine Vision Component. Sources of information are analyzed and approaches to machine vision events for tactical situations detection and estimation are discussed. Messages from Machine Vision Components are converted to Common Alerting Protocol and processed by Operation Center environment for tactical situations recognition. ...

Added: June 26, 2026

Дискретное моделирование процесса восстановительного ремонта участка дороги

Gadzhimirzaev S., Хельвас А. В., Компьютерные исследования и моделирование 2022 Т. 14 № 6 С. 1255–1268

This work contains a description of the results of modeling the process of maintaining the readiness of a section of the road network under strikes of with specified parameters. A one-dimensional section of road up to 40 km long with a total number of strikes up to 100 during the work of the brigade is ...

Added: June 26, 2026

Подход к оценке динамики уровня консолидированности отрасли

Gadzhimirzaev S., Хельвас А. В., Лукьянченко П. П., Computer Research and Modeling 2023 Vol. 15 No. 1 P. 129–140

In this article we propose a new approach to the analysis of econometric industry parameters for the industry consolidation level. The research is based on the simple industry automatic control model. The state of the industry is measured by quarterly obtained econometric parameters from each industry’s company provided by the tax control regulator. An approach ...

Added: June 26, 2026

Цифровой двойник полностью автоматизированного склада с глубокими стеллажами

Gadzhimirzaev S., Хельвас А. В., International Frequency Sensor Association (IFSA) Publishing, 19-21 February 2025 Granada, Spain 2025 P. 172–176

The paper presents models for an innovative fully robotic warehouse for storing boxed goods. A discrete multiagent simulation of the movement of shuttles in a warehouse for a given sequence of pallet shipments has been implemented. Different strategies for placement of boxes in various areas of a warehouse are evaluated, as well as optimal routing ...

Added: June 26, 2026

Incorporating Scientific Knowledge into Neural Network Density Functionals

Medvedev M., Journal of Chemical Theory and Computation 2026 Vol. 22 No. 9

Density functional theory (DFT) is the workhorse of modern reactions and materials modeling. While the exact functional remains unknown, many approximations to it have been constructed either by hand-crafting functional forms to satisfy exact constraints or by machine learning. In this work, we show how both of these approaches can be fused to build both ...

Added: June 26, 2026

Читательские прогулки по «Стеклянному городу»: искусство ориентирования. Материалы круглого стола

Shulyatieva D., Венедиктова Т. Д., Анцыферова О. Ю., LITERATURE OF THE AMERICAS 2026 № 20 С. 84–137

Круглый стол, посвященный роману Пола Остера (1947–2024) «Стеклянный город» (1985), в котором приняли участие преподаватели, аспиранты и студенты, состоялся на филологическом факультете МГУ им. М.В. Ломоносова 8 декабря 2025 г. Поводом было сорокалетие публикации романа, ставшему впоследствии первой частью «Нью-йоркской трилогии» Остера. Писателю удалось соединить в «Стеклянном городе» экзистенциальную озабоченность с нарративным экспериментом и условности криминального жанра ...

Added: June 25, 2026

Моделирование полностью роботизированного склада со стеллажами глубокого хранения

Gadzhimirzaev S., Хельвас А. В., Computer Research and Modeling 2026 Vol. 18 No. 2 P. 423–438

This article presents a model of a fully automated warehouse with deep storage racks designed for boxed goods storage. The study focuses on optimizing warehouse operations through discrete multiagent simulation of shuttle movements for pallet loading and unloading tasks. The authors investigate various product placement strategies, including the Nearest Channel Positioning Algorithm (NCPA), Most Empty ChannelGroup Placement (MECGP), and ...

Added: June 24, 2026

A machine learning dataset on winter roads of Krasnoyarsk Krai, Russia for the forestry and infrastructural projects

Podolskaia E., Sinitsina A., European Journal of Forest Engineering 2026 Vol. 12 No. 1 P. 7–21

Machine learning in transport modeling has become a trend in science and industry. In this paper, we observe its main directions and focus on a dataset of seasonal road creation. Seasonality as a parameter in transport modeling has a significant impact on transport scenarios but is underestimated worldwide and in Russia, despite modern data challenges. ...

Added: June 24, 2026

Полевые исследования лесного ненецкого языка: экспедиция в Пуровский район 2024 г.

Kozlov A., Toldova S., Агичева О. К., Языки и фольклор коренных народов Сибири 2026 № 57(1) С. 101–112

This article outlines the results of linguistic fieldwork on the Forest Nenets language conducted in the Purovsky District of the Yamalo-Nenets Autonomous Okrug. The research stems from collective expeditions carried out in 2023 and 2024 by researchers and students from HSE University and Lomonosov Moscow State University in the town of Tarko-Sale and the village ...

Added: June 24, 2026

The state and prospects of using virtual reality technologies in sports: a brief review

Atlasov B., Selskiy A., Russian Journal of Information Technology in Sports 2025 Vol. 2 No. 1 P. 13–21

The article examines the current state of the global virtual and augmented reality (VR/AR) technology market in sports, noting its growth, although slower than previously expected. Special attention is paid to the Russian market, where the development of VR technologies in sports lags behind world leaders such as the United States, EU countries and China, ...

Added: June 23, 2026

AI & PDE: ICLR 2026 Workshop on AI and Partial Differential Equations

[б.и.], 2026.

Added: June 23, 2026

Алжирская война и французская литература: случай Жоржа Перека

Kirichenko V., Практики и интерпретации: журнал филологических, образовательных и культурных исследований, Россия 2026 Т. 11 № 1 С. 66–91

This article examines an underexplored aspect of French writer Georges Perec’s work: the influence of the Algerian War (1954–1962) on his literary legacy. While his works contain almost no direct mentions of the war, the article analyzes how this traumatic historical context permeates the themes, structure, and style of his writing. It appears that the war provided a ...

Added: June 23, 2026

Тезисы докладов Пятнадцатых Шмелёвских чтений: (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева):Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности

М.: Институт русского языка им. В.В. Виноградова РАН, 2026.

Сборник тезисов Пятнадцатых Шмелёвских чтений (К 100-летию со дня рождения академика Дмитрия Николаевича Шмелева) Жизнь слова: Научное наследие академика Д. Н. Шмелева в контексте современности. Охватывает разные аспекты современной русистики: от исторической лексикологии до современных трансформаций прагматики и семантики слов. ...

Added: June 23, 2026

2025 9th International Conference on Information, Control, and Communication Technologies (ICCT-2025)

IEEE, 2026.

The 9th International Scientific Conference on Information, Control, and Communication Technologies (ICCT-2025) had been held October 7-11, 2025 in Gomel, Belarus. The main technical areas and applications covered by the proceedings are optoelectronics, acousto-optic, microwave technology, antenna systems, measuring technology, metamaterials, nanostructures, nanofilms, photonic crystals, biology and medicine, biophotonics, bioengineering, neural networks in communication technologies; ...

Added: June 23, 2026

Система синтаксических инвариантов текстовой деятельности: статистические дескрипторы, семантическая структура и диагностические профили

Kudriavtseva E., / РЦИС. Серия № 0148-756-286. 2026.

The content of the work is the system is a system for identifying four types of written speech structures. A set of 11 calculated parameters, statistical standards, and semantic characteristics allows for the identification of a text's structure as the result of a specific cognitive schema (scene, event, story, evaluation). The method has been verified ...

Added: June 2, 2026

Почему растущие доходы не делают людей счастливее: эмоциональное объяснение парадокса Истерлина (Why Growing Incomes Do Not Make People Happier: an Emotional Explanation of the Easterlin Paradox)

Vorchik A., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2026.

This work is devoted to a theoretical explanation of the Easterlin paradox, according to which long-term economic growth does not make average level of people's happiness increasing. By happiness, we mean the intensity of emotions people experience while comparing their new income with its expected value, or the target income with its original value. In the first case, ...

Added: May 31, 2026

Школьный литературный канон эмиграции 1918–1939 гг.

Strizhkova D., / Институт русской литературы (Пушкинский Дом) РАН. Серия B001 "Репозиторий открытых данных по русской литературе и фольклору". 2026.

В базе данных представлена роспись русскоязычных литературных произведений и отрывков, напечатанных в учебниках по словесности, хрестоматиях, книгах для чтения, сборниках стихотворений и рассказов, выходивших во Франции, Германии, Латвии, Эстонии, Болгарии, Сербии в период первой волны русской эмиграции с 1918 по 1939 гг. Датасет представляет интерес для исследователей школьного литературного канона, эмиграции и детского чтения ...

Added: April 22, 2026

Современная российская мультипликация как инструмент воспитания традиционных духовно-нравственных ценностей

Жигунов А. Ю., / Basic Research Programme. Серия HUM "Humanities". 2026. № 1.

The article attempts to describe the features of the educational potential of Russian animation programmes in aspect of the representation of traditional spiritual and moral values. Based on media and semiotic analysis, the method of cultural and historical interpretation, animated Russian projects created from 2000 to the 2025, which were translated on television channels or streaming ...

Added: April 19, 2026

Transformer-based approaches for lemmatizing abbreviations in Russian texts

Glazkova A., Lyashevskaya O., Morozov D. et al., Journal of Mathematical Sciences 2025 Vol. 546 P. 32–47

This paper addresses the task of lemmatizing abbreviations in the Russian language. Abbreviation lemmatization is particularly challenging, as it involves not only transforming a word into its normal form but also correctly expanding the abbreviation. We explore two approaches to this task, both leveraging large pretrained language models. The first approach is generative, where the ...

Added: March 10, 2026