К задаче автоматической лексико-грамматической разметки старорусского корпуса XV-XVII вв.

Т. С. Гаврилова; Т. А. Шалганова; О. Н. Ляшевская

?

К задаче автоматической лексико-грамматической разметки старорусского корпуса XV-XVII вв.

Вестник Православного Свято-Тихоновского гуманитарного университета. Серия 3: Филология. 2016. Т. 47. № 2. С. 7–25.

Гаврилова Т. С., Шалганова Т. А., Lyashevskaya O.

The paper discusses two approaches to the automatic lexico-grammatical tagging of the Middle Russian texts (1400–1700), included in the Russian National Corpus (RNC). The task is to assign each token a part of speech label, a tuple of grammatical features, and a lemma (without disambiguation). Middle Russian combines, on the one hand, features of the earlier state of the grammatical system, including aorist and imperfect verb forms, the dual number, a number of archaic inflectional paradigms, and, on the other hand, features of modern Russian inflectional morphology. In lexicon, we can see the same mix of Old Russian and Modern Russian lemmas. Moreover, the texts can contain Church Slavonic and dialectal forms. Absence of a standardised orthography and absence of a standard variant pose even more challenges to processing Middle Russian texts. The first approach is based on writing an electronic dictionary of Old Russian and building a module to handle spelling inconsistency. In the absence of open electronic resources for Middle Russian morphology, an electronic dictionary of Church Slavonic was expanded and adapted to Middle Russian. The paper describes the steps required to change nominal and verbal entries in this dictionary. We follow the principle of «a wider expansion» which presupposes that the analyser is allowed to generate as many annotations as possible so that at least one annotation would be correct. The second approach uses, firstly, an existing Modern Russian tagger supplemented by the module reducing spelling variation, and secondly, a database of lexico-grammatical annotations retrieved from the Diachronic corpus of the RNC. We evaluate the output of both analysers against a manually annotated test data. We also discuss the benchmark scores and outline future prospects for the development of the Middle Russian taggers.

Research target: Philology and Linguistics

Priority areas: humanitarian

Language: Russian

Full text

Text on another site

Keywords: Национальный корпус русского языка древнерусский язык Russian National Corpus лексико-грамматическая разметка morphological analysis Middle Russian lexico-grammatical tagging grammatical dictionary spelling variation verb inflection старорусская письменность старорусский корпус морфологический таггер грамматический словарь именное словоизменение глагольное словоизменение

ТЕРРИТОРИАЛЬНАЯ ВАРИАТИВНОСТЬ ОКСИТАНСКОГО ЯЗЫКА: КЛАССИФИКАЦИЯ СЕВЕРНЫХ ДИАЛЕКТОВ

Бестолкова Г. В., Теория языка и межкультурная коммуникация 2023 № №3(50) С. 1–15

Significant role in modern Occitan language’s development is played by variety of dialects, subdialects and colloquial speech, that determines relevance of the study undertaken in this article. Occitan language dialects’ number is large, therefore only its northern dialects are considered in detail within this article. The material contained in the article allows to form a ...

Added: February 15, 2026

OCCITAN LANGUAGE IN FRANCE: HISTORICAL RETROSPECTIVE

Bestolkova G. V., Бестолкова Г. В., Теория языка и межкультурная коммуникация 2023 No. 2 (49) P. 13–21

The article involves descriptive analysis of Occitan language’s features (terminology, current status, linguistic features) and also its interactions with French language within France (francitan, français d’oc). Occitan language’s features considered in the article can serve as basis for fundamental scientific works as well as for classes on Romance languages’ history, Romance languages’ dialectology, Occitan and ...

Added: February 15, 2026

"Мрамор... Мракомор? Реквием Маяковскому в рабочей тетради Кручёных"

Khachaturyan L., Новое литературное обозрение 2025 Т. 195 № 5 С. 195–205

На материале неопубликованных рабочих тетрадей Алексея Кручёных 1927–1930 годов (РГАЛИ) рассмотрена коммуникативная структура текстов первого авангарда. Исследованмеханизм соавторства творческого тандема Кручёных—Хлебников (1912–1914 годы), послуживший основой дальнейшей самостоятельной работы Кручёных над поэтическими текстами «рукописного периода». Поэтапно комментируется создание реквиема Маяковскому («Уже роковистые тени заката…», апрель 1930) и связанных с ним автоэпитафий самого Кручёных (апрель — июнь 1930). ...

Added: February 12, 2026

«Коммуникативный парадокс: записные книжки и рабочие тетради „большой тройки“ русского авангарда»

Khachaturyan L., Studia Litterarum 2025 Т. 10 № 2 С. 358–381

В статье проводится сравнительный анализ рабочих тетрадей / записных книжек Владимира Маяковского, Велимира Хлебникова и Алексея Крученых. На примере черновых записей, большинство из которых до настоящего времени не опубликовано, выявляется общность приемов создания художественного текста, далеко выходящая за границы документально установленного соавторства 1912–1914 гг. Для объяснения этого феномена исследование возвращается к классической модели коммуникативной структуры ...

Added: February 12, 2026

Phonetic clustering characteristics in verbal fluency: A potential marker for differentiating subjective cognitive decline from mild cognitive impairment

Cherkasov N., Rodionova E., Зверева А. М. et al., Applied Neuropsychology: Adult 2026 P. 1–11

Objective Semantic and phonemic verbal fluency (VF) tasks are widely used to assess older adults’ cognition in clinical practice. Typical scoring only analyses the total number of correct words produced. We investigated whether differentiation between individuals with subjective cognitive decline (SCD) versus mild cognitive impairment (MCI), which is often challenging, could be enhanced by also assessing ...

Added: February 12, 2026

НАУЧНЫЙ СТАРТ-2024

М.: МГПУ, Языки народов мира, 2024.

В сборнике, представлены статьи, подготовленные аспирантами, магистрантами и соискателями в рамках мероприятия «Научный старт–2024 (с элементами научной школы)». Рассматриваются актуальные проблемы лингвистики, литературоведения и лингводидактики, которые входят в сферу интересов научных школ Института иностранных языков МГПУ. Материалы сборника могут быть полезны всем, кто интересуется проблемами языка, зарубежной литературы и лингводидактики. ...

Added: February 12, 2026

Development of a Language Model for Automated Classification of English-Language Scientific Articles by SRSTI Codes

Zunin V., Afonin A. I., Anoshin V. I. et al., Automatic Documentation and Mathematical Linguistics 2025 Vol. 5 No. 59 P. 287–293

The development of an artificial intelligence-based language model for classifying English-language scientific articles by SRSTI codes is described. This improves the processes of reviewing and indexing scientific publications. A pre-processed dataset of scientific articles was used for training and testing the models. An architecture for cascade classification was developed, and the performance of models with ...

Added: February 11, 2026

Техническое воображение Александра Кондратова

Rodionova A., Новое литературное обозрение 2025 № 195 С. 303–311

The article analyzes the poetic method of Alexander Kondratov (1937–1993), an unofficial author, researcher, and popularizer of cybernetics from Leningrad. The author addresses his concept of poetic creativity, projects, archives, popular science and poetic texts. The article analyzes the role of combinatorics in his work in the context of inheritance from the historical avant-garde and ...

Added: February 11, 2026

A genre-based model of rhetorical structure in scoping review introductions

Tikhonova E. V., Kosycheva M. A., Training, Language and Culture 2025 Vol. 9 No. 4 P. 35–55

As genre modelling advances, describing research articles rhetorical structures becomes crucial. Though secondary to empirical studies, scoping reviews shape scholarly communication by framing analysis and setting epistemological benchmarks. Their introductions act as conceptual lenses, defining interpretive frameworks. However, most rhetorical models, designed for empirical articles, appear to be inadequate for scoping reviews. We propose a ...

Added: February 9, 2026

Difference in Language Profiles of Children With Autism Spectrum Disorder and Down Syndrome Is Not Driven by Non-Verbal Cognition

Novoselova K., Lopukhina A., Gomozova M. et al., International Journal of Language and Communication Disorders 2026 Vol. 61 No. 1 P. 1–14

Background Autism Spectrum Disorder (ASD) and Down syndrome (DS) are among the most common types of neurodevelopmental conditions that have co-occurring language impairments. Usually, non-verbal IQ has been reported as one of the main predictors of language functioning in children with these conditions. Although language abilities of children with ASD and DS have been described in ...

Added: February 6, 2026

Роль женщин-ученых в развитии науки и образования: сборник научных статей участников Международного форума женщин-ученых, посвященного 105-летию БГУ

Мн.: РИВШ, 2026.

В сборнике представлены научные статьи женщин-ученых и преподавателей, участников Международного форума женщин-ученых, который был организован к 105-летию Белорусского государственного университета первичной организацией ОО «Белорус ский союз женщин» БГУ. В сборник вошли статьи представителей Беларуси, России, Китая, Кыргызстана, Азербайджана, Индии, Ирака, специалистов в области биологии, дизайна, журналистики, культурологии, медицины, менеджмента, педагогики, психологии, социологии, физики, филологии, философии, ...

Added: February 6, 2026

Роль аллюзии как элемента интертекстуальности в детективном дискурсе (на базе рассказа Э.К. Бентли «The Genuine Tabard»)

Ovodova M., Вестник Воронежского государственного университета 2024 № 3 С. 96–103

the given article focuses on the research of the intertextual elements in the detective discourse. Being a “dialogue” of two texts, intertextuality represents a two-dimensional structure, divided into material and thematical. The given types of intertextuality are expressed in borrowings of the elements of plane of expression (a word or its variant) and of plane of content ...

Added: February 5, 2026

Сходства и различия библейских прецедентных высказываний и библейских фразеологизмов как интертекстуальных маркеров в детективном рассказе

Ovodova M., Вопросы психолингвистики 2025 Т. 66 № 4 С. 80–95

The following article focuses on several points. First, it deals with the research of phraseological units and precedent expressions, second, with the elaboration of possible distinctions between them, since these linguistic units are close to each other. Phraseological units are structurally separable units, which possess sustainability and reproducibility, and which have absorbed implicit cultural meanings. ...

Added: February 5, 2026

Принципы разграничения и особенности функционирования интертекстуальных маркеров в детективных рассказах (на материале рассказов Э.К. Бентли и Г.К. Честертона)

Ovodova M., Иностранные языки в высшей школе, Россия 2025 Т. 72 № 1 С. 29–38

The paper deals with the phenomenon of intertextuality, which is studied on the material of 7 detective stories by E. K. Bentley and G. K. Chesterton. The research focuses on the interaction of biblical allusions and biblical precedent expressions and names, and their decoding in the detective discourse. The problem resides in the differentiation of ...

Added: February 5, 2026

Вербализация библейского мотива «гордость» в детективном дискурсе (на материале рассказа Г.К. Честертона «The Hammer of God»)

Ovodova M., Вестник Московского государственного лингвистического университета. Гуманитарные науки 2023 Т. 870 № 2 С. 85–91

The given article presents linguostylistic and intertextual analyses of the detective story by G. K. Chesterton in the context of the verbalization of biblical motifs in the detective text. The motif itself, having symbolic function, usually manifests itself in text through allusions, separate linguistic units, the combination of which produces an indelible effect on readers. One ...

Added: February 5, 2026

Nominative Object

Ronko R., Wiemer B., , in: Encyclopedia of Slavic Languages and Linguistics Online.: Brill, 2020.

The nominative object describes a clause type in which the object of a transitive verb takes nominative morphology, and this coding is not conditioned by voice operations. It is a salient property in regions in which Slavic varieties have been in contact with Finnic- and/or Baltic-speaking population, i.e., in the eastern part of the Circum-Baltic ...

Added: December 19, 2025

Политическая аккомодация культурных различий в индустриально развитых обществах (Political Accommodation of Cultural Differences in Industrialized Societies)

Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.

The notion of “political accommodation” applied to the theory and practice of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...

Added: September 26, 2025

The Twofold Nature of Old East Slavic Iže

Anna A. Fitiskina, Russian linguistics 2025 Vol. 49 Article 4

This paper aims to demonstrate that the Old East Slavic pronoun iže, traditionally considered a loanword from Old Church Slavonic and a marker of literacy, was in fact also widely used in secular texts of the earliest period and that its usage there differed considerably from that found in Old East Slavic church-oriented literature. The ...

Added: September 26, 2025

Берестяные грамоты из раскопок 2024 г. I. Великий Новгород, Троицкий раскоп

Gippius A., Вопросы языкознания 2025 № 4 С. 7–41

This article contains a preliminary publication of 30 birchbark letters found during the 2024 archaeological season at the Troitsky excavation in Veliky Novgorod. The vast majority of the published texts date back to the 12th century. Most important in historical and philological terms are the following items: a letter mentioning a military campaign and related ...

Added: September 21, 2025

Национальная мощь современных государств: сравнительный анализ. Аналитический доклад

Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.

Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...

Added: September 19, 2025

О национальном корпусе русского языка

Rakhilina E. V., Вестник Российской академии наук 2024 Т. 94 № 9 С. 795–803

Статья посвящена проекту создания Национального корпуса русского языка (НКРЯ) – мощной справочно-информационной системы по русскому языку, которая была разработана консорциумом организаций РАН с участием компании “Яндекс”. Описаны история создания Корпуса, основной его функционал и пути совершенствования, а также наиболее технологичные подкорпуса – поэтический, параллельный, мультимедийный; приведены примеры их работы. Особое внимание уделено последним разработкам, которые ...

Added: February 25, 2025

Explicit continuum scale format reduces the ceiling effect in self-report questionnaires comparing to Likert response format

Antipkina I., Ivanov A., Guzhelya D., / Series WP BRP "Basic research program". 2024.

This study presents a methodology for developing a new questionnaire format called explicit continuum scenario scales, in the example of a client focus questionnaire. Elements of the Rasch Guttman scenario scale methodology were used in its development. In three consequent studies, different aspects of the scale functioning were investigated. In Study 1, on the sample ...

Added: February 21, 2025

Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?

Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84

Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...

Added: January 7, 2025

Корпусная лингвистика на современном этапе

Plungian V., Вестник Российской академии наук 2024 Т. 94 № 9 С. 787–794

Даётся общее представление о корпусной лингвистике, её истории, методах и влиянии на современные представления об изучении языка, которое обычно обозначается как “корпусная революция”. ...

Added: December 16, 2024