Сложность русских правовых текстов: методы оценки и языковые данные

О. В. Блинова; Тарасов Н. А.

?

Сложность русских правовых текстов: методы оценки и языковые данные

С. 175–182.

Blinova O. V., Тарасов Н. А.

Our goal is to create a model for the automatic assessment of Russian legal texts complexity. To achieve this goal, it is necessary to create a text collection; perform linguistic markup; highlight the parameters for measuring the complexity, oriented on the selected markup format. These steps are described in this paper. We briefly describe three corpora of modern Russian legal texts “CorRIDA”, “CorDes”, “CorCodex” with a total size of 8.5 million tokens. We justify the choice of linguistic markup tools (UDPipe, pymorphy2). Then we characterize the linguistic features of the complexity assessment, including: the simplest basic metrics; five readability formulas; parameters for assessing lexical complexity (TTR values, Yule’s K, the number of hapaxes, abbreviations, abstract words, etc.); parameters for assessing morphosyntactic and discursive complexity (Noun-Verb Ratio values; the number of grammemes of genitive, neuter, passive; relative sentences, appositive modifiers, lexical devices of discursive connectivity, etc.).

Language: Russian

Full text

Text on another site

Keywords: legal documents языковая сложность linguistic complexity lexical complexity лексическая сложность читабельность текста morphosyntactic complexity discursive complexity synchronous corpora of legal Russian правовые документы морфосинтаксическая сложность дискурсивная сложность русские синхронные юридические корпусы

In book

Труды международной конференции «Корпусная лингвистика-2021»

СПб.: Скифия-принт, 2021.

Geospatial effects on phonological complexity in the world’s languages

Hartmann F., Nichols J., Linguistic Typology 2025

Linguistic complexity has generally been seen as influenced by ecological, demographic, and sociolinguistic factors and has been approached by seeking correlations of increased complexity along one linguistic dimension with one or another extralinguistic factor. Here we use a multidimensional definition of phonological complexity and analyze its global patterning quantitatively across predefined continents or sets of ...

Added: July 26, 2025

Методические рекомендации по повышению удобочитаемости текстов: Как написать нормативный правовой акт простым языком?

Alimpeev D., Knutov A., Plaksin S. et al., М.: Издательский дом НИУ ВШЭ, 2024.

Korney Chukovsky designated clericalism as the sole genuinely substantial malady afflicting the Russian language, and Nora Gal likened the style of official documentation to the consumption of "dry food". In recent decades, it has become increasingly challenging to discern the content of normative documents. This phenomenon can be attributed to several factors, namely the proliferation ...

Added: December 23, 2024

Субъективная трудность текстов виртуального тура по Эрмитажу: пилотное исследование

Колмогорова П. А., Куликова Е. Р., Человек: образ и сущность. Гуманитарные аспекты 2025 № 2(62) С. 139–155

В статье обсуждается вопрос оценки трудности текстов, сопровождающих виртуальный тур по Главному музейному комплексу Государственного Эрмитажа. Методика оценки трудности, в отличие от сложности как более объективной, поддающейся параметризации характеристики текста, представляется открытым вопросом. В статье описываются результаты пилотного эксперимента, в котором информанты оценивали тексты, выделяя и комментируя фрагменты, вызывающие затруднения. Анализ показал, что наиболее частыми ...

Added: November 8, 2024

Лингвистическая сложность текстов жанра «виртуальная экскурсия по музею» (на материале виртуального визита в Государственный Эрмитаж)

Kolmogorova A., Куликова Е. Р., Колмогорова П. А., Текст. Книга. Книгоиздание 2025 № 38 С. 29–54

The article is devoted to the linguistic featuring of the texts of the Virtual visit to the State Hermitage Museum, available on the its official website. The purpose of the study is to analyze the set of lexical, morphological, syntactic and discursive metrics of the linguistic complexity of these texts in comparison with the same ...

Added: November 8, 2024

Modeling lemma frequency bands for lexical complexity assessment of Russian texts

Blinova O. V., Tarasov N., Blekanov I. et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 17 июня — 20 июня 2020 г.)Вып. 19(26).: М.: Изд-во РГГУ, 2020. P. 76–92.

The paper is devoted to the problem of modeling general-language frequency using data of large Russian corpora. Our goal is to develop a methodology for forming a consolidated frequency list which in the future can be used for assessing lexical complexity of Russian texts. We compared 4 frequency lists developed from 4 corpora (Russian National Corpus, ...

Added: December 12, 2022

Оценка сложности русских правовых текстов: архитектура модели

Blinova O. V., Мир русского слова 2022 № 2 С. 4–13

The paper describes the metrics-based model for assessing complexity of Russian legal texts. The architecture of the model implies the use of 130 metrics divided into following categories: “basic metrics”, “readability formulas”, “words of different part-of-speech classes”, “n-grams of part-of-speech tags”, “frequency of lemmas”, “word-building patterns”, “grammes”, “lexical and semantic features, multi-word expressions”, “syntactic features”, ...

Added: October 29, 2022

Word-formation complexity: a learner corpus-based study

Lyashevskaya O., Pyzhak J.V., Vinogradova O. I., Russian Journal of Linguistics 2022 Vol. 26 No. 2 P. 471–492

This article explores the word-formation dimension of learner text complexity which indicates how skilful the non-native speakers are in using more and less complex - and varied - derivational constructions. In order to analyse the association between complexity and writing accuracy in word formation as well as interactive effects of task type, text register, and ...

Added: October 5, 2022

Язык нормативных правовых актов: пора ли бить тревогу?

Knutov A., Chaplinskiy A., Alimpeev D., Вестник Пермского университета. Юридические науки 2022 № 3(57) С. 399–426

Introduction: the article describes the experience of assessing the readability of regulatory legal acts by analyzing the complexity of their syntactic constructions. According to the subjective perception, normative texts become more complicated from year to year, which makes it difficult to interpret them and understand the legal meaning. Purpose: to test this hypothesis based on ...

Added: October 3, 2022

Decisions of Russian Constitutional Court: Lexical Complexity Analysis in Shallow Diachrony

Blinova O. V., Belov S., Revazov M., , in: CEUR Workshop Proceedings (Proceedings of the International Conference "Internet and Modern Society" IMS-2020, 17-20 June 2020, ITMO University, St. Petersburg, Russia).: CEUR Workshop Proceedings, 2020. Ch. 5 P. 61–74.

Added: November 1, 2020

Русские официальные документы домена “Здравоохранение” и оценка их лексической сложности с использованием ключевых слов

Blinova O. V., Белов С. А., В кн.: Труды международной конференции «Корпусная лингвистика-2019».: Издательство Санкт-Петербургского государственного университета, 2019. С. 166–173.

The paper describes first findings of the study of Russian official documents comprehensibility. The research material is the Corpus of Russian local documents and acts «CorRIDA» (subcorpus of healthcare domain, consisting of 617107 tokens). The study aims to identify lexical peculiarities of official documents using the method of extracting keywords, as well as to evaluate the obtained keywords ...

Added: November 1, 2020

Why is gender so complex? Some typological considerations

Nichols J., , in: Grammatical gender and linguistic complexityVol. 1: General issues and specific studies.: Berlin: Language Science Press, 2019. Ch. 4 P. 63–92.

A cross-linguistic survey shows that languages with gender can have very high levels of morphological complexity, especially where gender is coexponential with case as in many Indo-European languages. If languages with gender are complex overall, apart from their gender, then gender can be regarded as an epiphenomenon of overall language complexity that tends to arise ...

Added: November 4, 2019

Grammatical gender and linguistic complexity

Berlin: Language Science Press, 2019.

he many facets of grammatical gender remain one of the most fruitful areas of linguistic research, and pose fascinating questions about the origins and development of complexity in language. The present work is a two-volume collection of 13 chapters on the topic of grammatical gender seen through the prism of linguistic complexity. The contributions discuss ...

Added: November 4, 2019

Automated assessment of learner text complexity

Lyashevskaya O., Irina Panteleeva, Olga Vinogradova, Assessing Writing 2021 No. 49 Article 100529

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, and over the past two decades there appeared various tools for the provision of automated instant feedback. The presented paper offers an application that focuses on measuring text complexity, ...

Added: October 20, 2019

Inspector: The Tool For Automated Assessment Of Learner Text Complexity

Olga I. Vinogradova, Olga N. Lyashevskaya, Irina M. P., / NRU Higher School of Economics. Series WP BRP 55/LNG/2017. 2019. No. 79.

EFL methodology has always recognized the importance of giving student learners of foreign languages regular and quick feedback on student speech production, both written and oral, but over the past two decades there appeared various tools ensuring the provision of automated instant feedback. The presented paper offers such a tool that focuses on measuring text ...

Added: October 10, 2019

Русский язык: статус и динамика развития на современном этапе

Somin A., Piperski A., Krongauz M. et al., / Россйская Академия народного хозяйства и государственной службы. Серия SSRN "working papers series". 2014.

Language change concerning various levels of linguistic description, such as vocabulary, semantics, and language etiquette, serves as a major research topic within this project. However, the description of language change is more valuable if we manage to discover its causes and propose an external interpretation of linguistic processes. For this reason, we study the dynamics ...

Added: March 17, 2016