Взiaлъ, възялъ, вьзял: Обработка орфографической вариативности при лексико-грамматической аннотации старорусского корпуса XV-XVII вв.

Т. С. Гаврилова; Т. А. Шалганова; О. Н. Ляшевская

doi:10.15382/sturIII201751.11-20

Publications

?

Взiaлъ, възялъ, вьзял: Обработка орфографической вариативности при лексико-грамматической аннотации старорусского корпуса XV-XVII вв.

Вестник Православного Свято-Тихоновского гуманитарного университета. Серия 3: Филология. 2017. Т. 51. С. 11–20.

Гаврилова Т. С., Шалганова Т. А., Lyashevskaya O.

The highly unstable orthography of the Middle Russian texts poses a challenge for their automatic processing. The Middle Russian subcorpus of the Russian National Corpus (RNC) includes documents written mainly between 1400 and 1700, when the variation in spelling was still a norm. The task of lexico-grammatical analysis is to assign a dictionary form (lemma), part of speech and grammatical tags to each word form in the corpus. Traditional methods of pos- and grammatical tagging assume that there can be (almost only) one possible string of characters representing the stem and ending of each grammatical form of the word. Since unstable orthography yields many-to-many mapping between word forms and grammatical annotations, morphological taggers perform poorly and need orthographic normalization preprocessing.

We use both relative and absolute normalization of orthographic representation. The relative normalization involves multiplying orthographic representations of stems and endings in the grammatical dictionary by regular rules. It is carried out at the level of (a) word endings; (b) nominative stems with regular variation, e.g. russk(ij) / russt(ij), keli(ja) / kel'(ja); (c) nominative stems of the Church Slavonic origin, e.g. odin- / edin-; (d) verb stems with prefixes; etc. The absolute normalization matches characters (character combinations) which alternate regularly in the corpus (e.g. o / ѡ 'omega', e / ѣ, шт / щ, жю / жу). The absolute normalization applies to both orthographic representations in the grammatical dictionary and word forms in the text.

Research target: Philology and Linguistics

Priority areas: humanitarian

Keywords: Национальный корпус русского языка древнерусский язык Russian National Corpus лексико-грамматическая разметка Old Russian Language орфографическая вариативность morphological analysis Middle Russian lexico-grammatical tagging spelling variation старорусская письменность orthographic normalization historical corpus linguistics исторические корпуса орфовариант унификация орфографии при автоматической обработке текста

Difference in Language Profiles of Children With Autism Spectrum Disorder and Down Syndrome Is Not Driven by Non-Verbal Cognition

Novoselova K., Lopukhina A., Gomozova M. et al., International Journal of Language and Communication Disorders 2026 Vol. 61 No. 1 P. 1–14

Background Autism Spectrum Disorder (ASD) and Down syndrome (DS) are among the most common types of neurodevelopmental conditions that have co-occurring language impairments. Usually, non-verbal IQ has been reported as one of the main predictors of language functioning in children with these conditions. Although language abilities of children with ASD and DS have been described in ...

Added: February 6, 2026

Роль женщин-ученых в развитии науки и образования: сборник научных статей участников Международного форума женщин-ученых, посвященного 105-летию БГУ

Мн.: РИВШ, 2026.

В сборнике представлены научные статьи женщин-ученых и преподавателей, участников Международного форума женщин-ученых, который был организован к 105-летию Белорусского государственного университета первичной организацией ОО «Белорус ский союз женщин» БГУ. В сборник вошли статьи представителей Беларуси, России, Китая, Кыргызстана, Азербайджана, Индии, Ирака, специалистов в области биологии, дизайна, журналистики, культурологии, медицины, менеджмента, педагогики, психологии, социологии, физики, филологии, философии, ...

Added: February 6, 2026

Роль аллюзии как элемента интертекстуальности в детективном дискурсе (на базе рассказа Э.К. Бентли «The Genuine Tabard»)

Ovodova M., Вестник Воронежского государственного университета 2024 № 3 С. 96–103

the given article focuses on the research of the intertextual elements in the detective discourse. Being a “dialogue” of two texts, intertextuality represents a two-dimensional structure, divided into material and thematical. The given types of intertextuality are expressed in borrowings of the elements of plane of expression (a word or its variant) and of plane of content ...

Added: February 5, 2026

Сходства и различия библейских прецедентных высказываний и библейских фразеологизмов как интертекстуальных маркеров в детективном рассказе

Ovodova M., Вопросы психолингвистики 2025 Т. 66 № 4 С. 80–95

The following article focuses on several points. First, it deals with the research of phraseological units and precedent expressions, second, with the elaboration of possible distinctions between them, since these linguistic units are close to each other. Phraseological units are structurally separable units, which possess sustainability and reproducibility, and which have absorbed implicit cultural meanings. ...

Added: February 5, 2026

Принципы разграничения и особенности функционирования интертекстуальных маркеров в детективных рассказах (на материале рассказов Э.К. Бентли и Г.К. Честертона)

Ovodova M., Иностранные языки в высшей школе, Россия 2025 Т. 72 № 1 С. 29–38

The paper deals with the phenomenon of intertextuality, which is studied on the material of 7 detective stories by E. K. Bentley and G. K. Chesterton. The research focuses on the interaction of biblical allusions and biblical precedent expressions and names, and their decoding in the detective discourse. The problem resides in the differentiation of ...

Added: February 5, 2026

Вербализация библейского мотива «гордость» в детективном дискурсе (на материале рассказа Г.К. Честертона «The Hammer of God»)

Ovodova M., Вестник Московского государственного лингвистического университета. Гуманитарные науки 2023 Т. 870 № 2 С. 85–91

The given article presents linguostylistic and intertextual analyses of the detective story by G. K. Chesterton in the context of the verbalization of biblical motifs in the detective text. The motif itself, having symbolic function, usually manifests itself in text through allusions, separate linguistic units, the combination of which produces an indelible effect on readers. One ...

Added: February 5, 2026

Китайский язык: второй иностранный язык: 7-й класс: сборник грамматических упражнений: учебное пособие

Sizova A., Просвещение, 2025.

This grammar practice workbook for Grade 7 learners from the "It's Time to Learn Chinese!" series is designed to strengthen students’ grammatical accuracy across a range of language skills and communicative activities. It offers an extensive collection of practice exercises, complemented by engaging tasks of varying formats and levels of difficulty. In addition, the book ...

Added: February 4, 2026

К вопросу об оправданности заимствования термина "резильентность": на материале текстов экономической профессиональной направленности

Мартынова И. А., Вестник Самарского государственного университета. Гуманитарная серия 2023 Т. 29 № 4 С. 156–163

At the present stage of development, various cultures, economies, language systems and societies in general are in close interaction with each other. The processes of borrowing and assimilation of loan words in the language are becoming the most significant ones in the linguistic development of modern society. They bring to the fore the problem of ...

Added: February 4, 2026

Natural Language Processing and Information Systems : 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4-6, 2025 : proceedings. Part I

Springer, 2025.

The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...

Added: February 3, 2026

No ‘iota’ type-shifter in Kazym Khanty

Tiutiunnikova V., Mikhailov Stiopa, Golosov F., Proceedings of Sinn und Bedeutung 2025 No. 29 P. 1593–1608

In this paper, we present new challenging data from Kazym Khanty (a Uralic language spoken in Western Siberia, Russia): in this articleless language, bare singular and bare dual NPs in argument positions can receive indefinite readings on par with definite ones, contradicting the predictions of the classic neo-Carlsonian approach (Chierchia, 1998; Dayal, 2004). We argue ...

Added: January 30, 2026

Употребление порядковых числительных в разных семантических контекстах (на материале параллельных переводов Нового Завета)

Nasledskova P., Известия РАН. Серия литературы и языка 2025 Т. 84 № 6 С. 88–102

Работа посвящена сравнению употребления порядковых конструкций в разных семантических контекстах в пяти языках: русском, английском, испанском, индонезийском и рутульском. Сравнение проведено на материале параллельных переводов Нового завета. Из шести книг Нового Завета (канонические Евангелия, Деяния апостолов и Откровение Иоанна Богослова) были выбраны стихи, в которых хотя бы в одном из языков выборки употреблены порядковые числительные. ...

Added: January 29, 2026

Применение технологий ИИ в обучении студентов в рамках дисциплины «Академическое письмо на английском языке»

Gabrielova E., Магия ИННО 2025 Т. 7 № 1 С. 165–172

Artificial intelligence (AI) technologies are rapidly developing and are being widely applied in various fields, including education. The use of AI carries certain risks; however, one cannot completely reject it in student education. The article presents the experience of using AI in teaching English to 34 fourth-year students and 26 post-graduate students within the discipline ...

Added: January 29, 2026

Explorations in Applied Ethnolinguistics: Words, Cultures, and Global Perspectives

Palgrave Macmillan, 2025.

This volume contributes to the growing body of cutting-edge research into the Natural Semantic Metalanguage (NSM) approach in linguistics. It explores the broad range of possible applications enabled by the NSM approach, from linguistic studies of semantics and culture to cross-cultural studies, psychology and childhood education. The volume builds on previous studies, bringing a diversity ...

Added: January 28, 2026

Эпос о Гильгамеше. Перевод Николая Гумилева. Предисловие Е. Маркиной. Введение В. Шилейко.

Markina E., Манн, Иванов и Фербер, 2025.

Аннотация издателя: «Эпос о Гильгамеше» — древнейший памятник мировой литературы, дошедший до нас из глубин шумерской и аккадской цивилизаций. Поэма повествует о приключениях могущественного царя города Урука и его друга Энкиду. Это история о силе и дружбе, гордыне и смирении, страхе смерти и жажде бессмертия. Поэма издается в переводе поэта-акмеиста Николая Гумилева с пояснительной статьей ассириолога и современника поэта Владимира Шилейко, ...

Added: January 28, 2026

The representation of the climate crisis in Croatian online news media

Šarić L., Trnavac R., Frontiers in Communication 2026

This study analyzes agency and news values across verbal and visual modalities in Croatian online news on the climate crisis, examining how climate change is portrayed. We explore newsworthiness, visual framing, and metaphor, linking agency to broader concerns about responsibility. In addition, the analysis traces how different types of agency shape news values in both metaphorical and nonmetaphorical micro-contexts. To ...

Added: January 27, 2026

Semi-fake indexicals in Russian

Тискин Д. Б., Типология морфосинтаксических параметров 2025 Vol. 8 No. 1 P. 112–129

There are several rival theories of fake indexicals, i.e. bound indexicals (prominently pronouns) whose φ-features do not semantically contribute to focus alternatives (e.g. Only Mary did her homework, John didn’t do his). According to Minimal Pronoun theories (such as Kratzer’s or Wurmbrand’s), bound pronouns are Merged without φ-features and acquire them under binding via agreement-like ...

Added: January 26, 2026

Nominative Object

Ronko R., Wiemer B., , in: Encyclopedia of Slavic Languages and Linguistics Online.: Brill, 2020.

The nominative object describes a clause type in which the object of a transitive verb takes nominative morphology, and this coding is not conditioned by voice operations. It is a salient property in regions in which Slavic varieties have been in contact with Finnic- and/or Baltic-speaking population, i.e., in the eastern part of the Circum-Baltic ...

Added: December 19, 2025

Политическая аккомодация культурных различий в индустриально развитых обществах (Political Accommodation of Cultural Differences in Industrialized Societies)

Малахов В. С., Симон М. Е., Летняков Д. Э. et al., / SSRN. Серия Social Science Research Network "Social Science Research Network". 2020.

The notion of “political accommodation” applied to the theory and practice of managing cultural diversity could enrich the Russian academic dictionary. Liberal democratic states invented specific mechanisms for political accommodation of cultural differences. Thanks to these mechanisms, the part of the population of a democratic state that is not ready to dissolve into the ethnocultural ...

Added: September 26, 2025

The Twofold Nature of Old East Slavic Iže

Anna A. Fitiskina, Russian linguistics 2025 Vol. 49 Article 4

This paper aims to demonstrate that the Old East Slavic pronoun iže, traditionally considered a loanword from Old Church Slavonic and a marker of literacy, was in fact also widely used in secular texts of the earliest period and that its usage there differed considerably from that found in Old East Slavic church-oriented literature. The ...

Added: September 26, 2025

Берестяные грамоты из раскопок 2024 г. I. Великий Новгород, Троицкий раскоп

Gippius A., Вопросы языкознания 2025 № 4 С. 7–41

This article contains a preliminary publication of 30 birchbark letters found during the 2024 archaeological season at the Troitsky excavation in Veliky Novgorod. The vast majority of the published texts date back to the 12th century. Most important in historical and philological terms are the following items: a letter mentioning a military campaign and related ...

Added: September 21, 2025

Национальная мощь современных государств: сравнительный анализ. Аналитический доклад

Melville A. Y., Каберник В. В., Mironyuk M. et al., / МГИМО МИД России. 2024.

Данный аналитический доклад является одним из результатов исследований в рамках консорциума НИУ ВШЭ и МГИМО. В нем прежде всего раскрыты вопросы концептуализации национальной мощи и сопутствующих категорий и дается обзор прецедентов. Далее рассматриваются вопросы операционализации предлагаемых нами компонентов национальной мощи. В следующих разделах доклада предлагается анализ вопросов методологии, используемой в докладе. На этой основе предложен ...

Added: September 19, 2025

О национальном корпусе русского языка

Rakhilina E. V., Вестник Российской академии наук 2024 Т. 94 № 9 С. 795–803

Статья посвящена проекту создания Национального корпуса русского языка (НКРЯ) – мощной справочно-информационной системы по русскому языку, которая была разработана консорциумом организаций РАН с участием компании “Яндекс”. Описаны история создания Корпуса, основной его функционал и пути совершенствования, а также наиболее технологичные подкорпуса – поэтический, параллельный, мультимедийный; приведены примеры их работы. Особое внимание уделено последним разработкам, которые ...

Added: February 25, 2025

Explicit continuum scale format reduces the ceiling effect in self-report questionnaires comparing to Likert response format

Antipkina I., Ivanov A., Guzhelya D., / Series WP BRP "Basic research program". 2024.

This study presents a methodology for developing a new questionnaire format called explicit continuum scenario scales, in the example of a client focus questionnaire. Elements of the Rasch Guttman scenario scale methodology were used in its development. In three consequent studies, different aspects of the scale functioning were investigated. In Study 1, on the sample ...

Added: February 21, 2025

Automatic Morpheme Segmentation for Russian: Can an Algorithm Replace Experts?

Morozov D., Garipov T., Lyashevskaya O. et al., Journal of Language and Education 2024 Vol. 10 No. 4 P. 71–84

Introduction: Numerous algorithms have been proposed for the task of automatic morpheme segmentation of Russian words. Due to the differences in task formulation and datasets utilized, comparing the quality of these algorithms is challenging. It is unclear whether the errors in the models are due to the ineffectiveness of algorithms themselves or to errors and inconsistencies ...

Added: January 7, 2025