Frequency dictionary of inflectional paradigms: core Russian vocabulary

O. Lyashevskaya

?

Frequency dictionary of inflectional paradigms: core Russian vocabulary

Basic Research Programme , 2013.

A new kind of frequency dictionary is a valuable reference for researchers and learners of Russian. It shows the grammatical profiles of nouns, adjectives and verbs, namely, the distribution of grammatical forms in the inflectional paradigm. The dictionary is based on data from the Russian National Corpus (RNC) and covers a core vocabulary (5000 most frequently used lexemes). Russian is a morphologically rich language: its noun paradigms harbor two dozen case & number forms and verb paradigms include up to 160 grammatical forms. The dictionary departs from traditional frequency lexicography in several ways: 1) word forms are arranged in paradigms, and their frequencies can be compared and ranked; 2) the dictionary is focused on the grammatical profiles of individual lexemes rather than on overall distribution of grammatical features (e.g. the fact that Future forms are used less frequently than Past forms); 3) grammatical profiles of lexical units can be compared against the mean scores of their lexico-semantic class; 4) in each part of speech or semantic class, lexemes with certain biases in grammatical profile can be easily detected (e.g. verbs used mostly in Imperative, in Past neutral, or nouns used often in plural); 5) the distribution of homonymous word forms and grammatical variants can be followed in time and within certain genres and registers. The dictionary will be a source for research in the field of Russian grammar, paradigm structure, form acquisition, grammatical semantics, as well as variation of grammatical forms. The main challenge for this initiative is the intra-paradigm and inter-paradigm homonymy of word forms in corpus data. Manual disambiguation is accurate but covers ca. 5 million words in the RNC, so the data may be sparse and possibly unreliable. Automatic disambiguation yields slightly worse results, however, a larger corpus shows more reliable data for rare word forms. A user can switch between a ‛basicʼ version which is based on a smaller collection of manually disambiguated texts, and an ‛expandedʼ version which is based on the main corpus, the newspaper corpus, the corpus of poetry and the spoken corpus (320 million words in total). The article addresses some general issues such as establishing the common basis of comparison, a level of granularity of grammatical profile, units of measurement. We suggest certain solutions related to the selection of data, corpus data processing and maintaining the online version of the frequency dictionary.

Research target: Philology and Linguistics

Priority areas: humanitarian

Language: English

Full text

Text on another site

Keywords: русский язык вариативность Russian language НКРЯ частотный словарь frequency dictionary grammatical profile inflection Russian National Corpus грамматический профиль лексемы словоизменение grammatical homonymy grammatical variation омонимичные словоформы

Частотный лексико-грамматический словарь: проспект проекта

Lyashevskaya O., В кн.: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.Т. 1: Основная программа конференции. Вып. 12 (19). М.: РГГУ, 2013. С. 478–489.

A new electronic frequency dictionary shows the distribution of grammatical forms in the inflectional paradigm of Russian nouns, adjectives and verbs, i.e. the grammatical profile of individual lexemes and lexical groups. While the frequency hierarchy of grammatical categories (e.g. the frequency of part of speech classes or the average ratio of Nominative to Instrumental case ...

Added: May 13, 2013

Электронная база вариативных явлений

Dobrushina N., Стаферова Д. А., Белоконь А. А., Slovĕne 2018 № 1 С. 424–436

The paper is an overview of the Repository of Variationist Research (https://vastry.ru/), an online storage and interactive plotting tool for quantitative sociolinguistic data. The paper describes a number of sociolinguistic experiments from which the data come and outlines the Repository and the toolkit it provides to its users. ...

Added: August 18, 2018

A Data Analysis Tool for the Corpus of Russian Poetry

Lyashevskaya O., Vlasova E., Litvintseva K. et al., / НИУ ВШЭ. Series WP BRP "Linguistics". 2018. No. 77.

A data analysis tool of the Corpus of Russian Poetry (a part of the Russian National Corpus) is designed for quantitative research in various areas of versology and linguistics aspects of poetic texts. The core part, a statistic database of the corpus, includes annotation at the level of texts, verses, words as well as patterns ...

Added: December 13, 2018

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М.: Языки славянской культуры, 2016.

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015

Стилистически маркированные глаголы в русском языке: совать-сунуть

Rakhilina E. V., Вестник Томского государственного университета 2015

The paper deals with the morphosyntactic and stylistic properties of the Russian verb SUNUT’ and argues for their semantic motivation. SUNUT’ is usually considered as one of “putting verbs” (denoting change of location), but it has some peculiarities in its syntax, derivational patterns, semantics and stylistics. Unlike other verbs of this taxonomic class, SUNUT’ profiles ...

Added: June 2, 2015

Адъективные средства выражения предшествования в субъектных, объектных и предикативных группах во французском и русском языке (сопоставительный анализ)

Naberezhnova Z. G., Альманах современной науки и образования 2010 № 11(42) Ч.1 С. 161–167

Время - универсальная междисциплинарная категория, изучаемая разными науками. Лингвистическое время многообразно в своем проявлении и передается не только временными формами глагола, но и неглагольными средствами. В каждой части речи имеются лексико-семантические группы с темпоральным значением. При исследовании неглагольных средств выражения временных отношений целесообразно выделить три плана анализа: план содержания, план выражения и план функционирования. В ...

Added: November 7, 2012

Материалы к корпусной грамматике русского языка

СПб.: Издательство Нестор-История, 2018.

The volume is the third issue of a corpora-based grammar of Russian. The volume deals with the issues of parts of speech and, more generally, with formal classes of lexicon, It comprises descriptive papers of separate POS and lesser world classes. ...

Added: November 4, 2018

Предикативное согласование со словами ряд, половина, часть, множество в современном русском языке

Kuvshinskaya Y. M., Сибирский филологический журнал 2019 № 2 С. 189–215

The work deals with the strategies for predicate agreement to quantified noun groups headed by nouns. In Russian, as in other Slavic languages, predicate agreement with quantified noun phrases allows singular or plural forms of the predicate. As for the sentences with quantifiers-nouns r’ad, polovina, chast’, mnozestvo, three agreement strategy are probable: predicate agrees with ...

Added: September 8, 2019

Закон о государственном языке Российской Федерации: история обсуждения и поправок

Krongauz M., Zeitschrift für Slavische Philologie 2016 Т. 72 № 2 С. 255–269

The law “On the state language of the Russian Federation,” adopted in 2005, became a reflection of debates about the Russian language at the beginning of the twenty-first century and caused frustration in both conservative and liberal segments of society. On multiple occasions, attempts were made to amend the law or enact additional laws aimed ...

Added: October 22, 2017

Интенсификатор "до ужаса" в русском языке на пути грамматикализации

Герасимов Д. В., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2016 Т. XII № 1 С. 336–363

The paper presents a corpus-driven study of the Russian PP-based degree modifier do uzhasa (lit. ‘to horror’), suggesting a two-stage grammaticalization path. The first stage (presumably, XVIII–XIX c.) involves subjectification, while during the second stage, subjective readings give rise to intensifier readings through conceptual metonymy. Both stages see a host class expansion. This process is ...

Added: November 27, 2017

Понятие «банкротство» в координатах правовой лингвистики: русско-англо-французские аппроксимации

Vlasenko S. V., Галимов А. Р., Вестник Тверского государственного университета. Серия: Филология 2012 Т. 10 № 2 С. 21–28

«Bankruptcy» Concept Within the Legal Linguistics Coordinates: Russian–English–French Approximations The article addresses the notion of bankruptcy as perceived by speakers of current Russian, English and French languages both lawyers and participants in professional communication from other trades. Semantic structure of the term is identified based on its lexicographic and regulatory definitions. ...

Added: October 4, 2012

Изменения согласных по месту и способу образования на стыках слов в некоторых двухфонемных сочетаниях

Duryagin P., Известия Юго-Западного государственного университета. Серия: Лингвистика и педагогика 2015 № 2 С. 78–88

The paper contains the results of phonetic experiment concerning the changes of place and manner of articulation in four biphonemic consonant clusters in Modern Standard Russian. The rules for assimilation and coarticulation of consonants are reviewed in the article, application of these rules in internal and external sandhi positions is compared. ...

Added: October 15, 2016

О чувстве долга как лингвоспецифичном концепте русского языка (в фокусе Национального корпуса русского языка)

Botchkarev A., Вестник Санкт-Петербургского университета. Язык и литература 2019 Т. 16 № 1 С. 20–32

The article explores a sense of duty as a language-specific concept in the Russian language conscience. In this regard, the National Russian Corpus is more appropriate because a conceptual configuration of an analyzed concept is not present in “finished” form in any single utterance but may be reconstructed only on the totality of all possible ...

Added: April 2, 2019

Место императива в общении взрослого и ребенка

Иванова К. А., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2015 Т. 11 № 1 С. 565–584

Requests and commands expressed by the imperative verb forms appear during the earliest period of language acquisition. For some verbs the imperative may be the first form to be acquired and appears to be the initial step towards the acquisition of the full paradigm. In this article the typical imperative contexts of the child-adult communication ...

Added: October 29, 2016

The rise of a lingua franca: The case of Russian in Dagestan

Dobrushina N., Kultepina O., International Journal of Bilingualism 2021 Vol. 25 No. 1 P. 338–358

Aims and objectives: In Dagestan, Russian is the language of education, urban way of life, and upward social mobility, and the means of communication between speakers of different languages. This is a result of a quick and drastic change. At the end of the 19th century, Russian was spoken by less than 1% of the population. ...

Added: October 14, 2020

Investigaciones comparadas ruso-españolas: aspectos teoricos y metodologicos

Granada: Jizo Ediciones, 2011.

Материалы конференции содержат полные тексты докладов по темам «Русско-испанские сопоставительные исследования», «Образ России и Испании в литературе, истории и культурологии», «Русский испанский языки в теории и практике перевода». ...

Added: February 26, 2013

Параметрическая трактовка «пока... не»: решения и проблемы

Тискин Д. Б., Типология морфосинтаксических параметров 2018 Т. 1 № 1 С. 136–150

The paper aims to further advance the line of research where all uses of the Russian subordinator poka are given a unified semantics, and all constructions involving it are assumed to be fully compositional. I focus on several further issues, viz. whether it is simultaneity or precedence that lies in the core of the meaning ...

Added: February 19, 2019

Person as an inflectional category

Nichols J., Linguistic Typology 2017 Vol. 21 No. 3 P. 387–456

The category of person has both inflectional and lexical aspects, and the distinction provides a finely graduated grammatical trait, relatively stable in both families and areas, and revealing for both typology and linguistic geography. Inflectional behavior includes reference to speech-act roles, indexation of arguments, discreteness from other categories such as number or gender, assignment and/or placement in syntax, arrangement in ...

Added: November 14, 2017

Лики билингвизма

СПб.: Златоуст, 2016.

This book is a collection of papers written by Russian and foreign linguists to highlight the different aspects of bilingualism. Much attention is paid to the early simultaneous and successive bilingualism in children; however, adults speaking several languages in natural settings as well as in classroom are also considered. Some chapters are concentrated on language attrition — an ...

Added: October 2, 2016

О глубинной семантике и функционально-референциальных особенностях адъективных показателей предшествования во французском и русском языках

Naberezhnova Z. G., Альманах современной науки и образования 2010 № 12 С. 226–231

...

Added: November 23, 2012

О псевдоидентификации в русском языке (на примере обозначений человека в русских литературных текстах)

Botchkarev A., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 2 С. 5–12

There is no way to identify an animate object other than to describe its specific characteristics which necessarily look like deviations from the normal “average” pattern, named here paragon, in which the Axiological Standard of a human group is fixed. Of particular heuristic interest is, in this regard, the logical pattern, often used in Russian for describing such ...

Added: July 28, 2020

Реализация сочетаний гоморганных взрывных согласных на стыках слов в современном русском языке

Duryagin P., Филологические науки. Вопросы теории и практики 2015 Т. 1 № 7 С. 56–63

The article presents the results of the experimental study of the phonetic realization of combinations of two homorganic plosive sounds at the junction of phonetic words in the modern Russian language. The conducted experiment shows that in the considered positions as a result of the co-articulation rules a single plosive is formed, having subject to ...

Added: October 14, 2016

Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты

Радбиль Т. Б., Рацибурская Л. В., Щеникова Е. В. et al., М.: Флинта, 2021.

В коллективной монографии «Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты» анализируется современное состояние русского языка в Интернете, дается лингвокультурологическая интерпретация активных процессов в интернет-коммуникации через призму лингвокогнитивного и лингвопрагматического подходов ...

Added: April 13, 2021

Механизм семантического калькирования и его роль в восполнении дефектных парадигм числа абстрактных существительных в современном русском языке

Gorbov A. A., Вестник Санкт-Петербургского университета. Серия 9. Филология. Востоковедение. Журналистика 2015 № 2 С. 96–104

The paper analyses the criteria for determining semantic calques in modern Russian based on typical examples. The analysis shows that a semantic calque can be clearly attested only when there is a pre-established translation correspondence between the words of the source language and the target language. Special attention is paid to the examples of semantic ...

Added: October 4, 2015