A corpus-based quantitative approach to the study of morphological productivity in diachrony: The case of samo-compounds in Russian

Naccarato C.

?

A corpus-based quantitative approach to the study of morphological productivity in diachrony: The case of samo-compounds in Russian

P. 133–152.

The present paper aims at investigating the productivity of the prefixoid samo- (‘self’) in Russian compounds from a diachronic perspective. In order to verify the hypothesis that the productivity of this prefixoid has grown over time, I
consider the occurrences of samo-compounds in the Russian National Corpus, dividing the main corpus into four subcorpora, each one representing a particular time span: the 18th century, the 19th century, the 20th century and the period that lasts from the beginning of the 21st century to the present day. The approach chosen is quantitative in nature, and is based on the measure of “potential productivity” (Baayen & Lieber 1991; Baayen 1992, 1993), which is
calculated by dividing the number of hapax legomena with a certain affix by the number of tokens with that affix. This measure, however, seems inadequate for the comparison of differently-sized corpora. To overcome this problem, I resort to parametric statistical models of frequency distribution known as LNRE (Large Number of Rare Events) models (Baayen 2001). These models, which allow extrapolating the expected values of types and hapax legomena with a given affix for arbitrary values of tokens, are implemented in the package zipfR (Baroni & Evert 2014), a tool for lexical statistics in R, which is used for this study.

Language: English

Text on another site

Keywords: Russian language Russian National Corpus compounds morphological productivity

In book

A blend of MaLT: Selected contributions from the Methods and Linguistic Theories Symposium 2015

University of Bamberg Press, 2016.

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М.: Языки славянской культуры, 2016.

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015

Frequency dictionary of inflectional paradigms: core Russian vocabulary

Lyashevskaya O., / Series HUM "Humanities". 2013.

A new kind of frequency dictionary is a valuable reference for researchers and learners of Russian. It shows the grammatical profiles of nouns, adjectives and verbs, namely, the distribution of grammatical forms in the inflectional paradigm. The dictionary is based on data from the Russian National Corpus (RNC) and covers a core vocabulary (5000 most ...

Added: May 13, 2013

Disambiguation in context in the Russian National Corpus: 20 yeas later

Lyashevskaya O., Afanasev I., Stefan Rebrikov et al., , in: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог». Вып. 22.Вып. 22. [б.и.], 2023. P. 307–318.

An updated annotation of the Main, Media, and some other corpora of the Russian National Corpus (RNC) features the part-of-speech and other morphological information, lemmas, dependency structures, and constituency types. Transformer-based architectures are used to resolve the homonymy in context according to a schema based on the manually disambiguated subcorpus of the Main corpus (morphology ...

Added: September 15, 2023

I composti verbali in russo

Naccarato C., L'analisi linguistica e letteraria (Italy) 2015 Vol. 23 No. 1 P. 77–92

The paper is a descriptive study of verb-based compounds in Russian. After a brief description of the main characteristics of such compounds and of the criteria used for their identification, compounds are classified on the basis of their morphological structure. ...

Added: October 4, 2018

Частотный лексико-грамматический словарь: проспект проекта

Lyashevskaya O., В кн.: Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной Международной конференции «Диалог» (Бекасово, 29 мая - 2 июня 2013 г.). В 2-х т.Т. 1: Основная программа конференции. Вып. 12 (19). М.: РГГУ, 2013. С. 478–489.

A new electronic frequency dictionary shows the distribution of grammatical forms in the inflectional paradigm of Russian nouns, adjectives and verbs, i.e. the grammatical profile of individual lexemes and lexical groups. While the frequency hierarchy of grammatical categories (e.g. the frequency of part of speech classes or the average ratio of Nominative to Instrumental case ...

Added: May 13, 2013

A Data Analysis Tool for the Corpus of Russian Poetry

Lyashevskaya O., Vlasova E., Litvintseva K. et al., / NRU HSE. Series WP BRP "Linguistics". 2018. No. 77.

A data analysis tool of the Corpus of Russian Poetry (a part of the Russian National Corpus) is designed for quantitative research in various areas of versology and linguistics aspects of poetic texts. The core part, a statistic database of the corpus, includes annotation at the level of texts, verses, words as well as patterns ...

Added: December 13, 2018

О псевдоидентификации в русском языке (на примере обозначений человека в русских литературных текстах)

Botchkarev A., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 2 С. 5–12

There is no way to identify an animate object other than to describe its specific characteristics which necessarily look like deviations from the normal “average” pattern, named here paragon, in which the Axiological Standard of a human group is fixed. Of particular heuristic interest is, in this regard, the logical pattern, often used in Russian for describing such ...

Added: July 28, 2020

О чувстве долга как лингвоспецифичном концепте русского языка (в фокусе Национального корпуса русского языка)

Botchkarev A., Вестник Санкт-Петербургского университета. Язык и литература 2019 Т. 16 № 1 С. 20–32

The article explores a sense of duty as a language-specific concept in the Russian language conscience. In this regard, the National Russian Corpus is more appropriate because a conceptual configuration of an analyzed concept is not present in “finished” form in any single utterance but may be reconstructed only on the totality of all possible ...

Added: April 2, 2019

Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты

Радбиль Т. Б., Рацибурская Л. В., Щеникова Е. В. et al., М.: Флинта, 2021.

В коллективной монографии «Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты» анализируется современное состояние русского языка в Интернете, дается лингвокультурологическая интерпретация активных процессов в интернет-коммуникации через призму лингвокогнитивного и лингвопрагматического подходов ...

Added: April 13, 2021

Предикативное согласование со словами ряд, половина, часть, множество в современном русском языке

Kuvshinskaya Y. M., Сибирский филологический журнал 2019 № 2 С. 189–215

The work deals with the strategies for predicate agreement to quantified noun groups headed by nouns. In Russian, as in other Slavic languages, predicate agreement with quantified noun phrases allows singular or plural forms of the predicate. As for the sentences with quantifiers-nouns r’ad, polovina, chast’, mnozestvo, three agreement strategy are probable: predicate agrees with ...

Added: September 8, 2019

Изменения согласных по месту и способу образования на стыках слов в некоторых двухфонемных сочетаниях

Duryagin P., Известия Юго-Западного государственного университета. Серия: Лингвистика и педагогика 2015 № 2 С. 78–88

The paper contains the results of phonetic experiment concerning the changes of place and manner of articulation in four biphonemic consonant clusters in Modern Standard Russian. The rules for assimilation and coarticulation of consonants are reviewed in the article, application of these rules in internal and external sandhi positions is compared. ...

Added: October 15, 2016

FrameBank: a database of Russian lexical constructions

Lyashevskaya O., Kashkin E., , in: Analysis of Images, Social Networks and Texts. 4th International Conference, AIST 2015, Yekaterinburg, Russia, April 9–11, 2015, Revised Selected PapersVol. 542: Series: Communications in Computer and Information Science. Switzerland: Springer, 2015. Ch. 34 P. 337–348.

Russian FrameBank is a bank of annotated samples from the Russian National Corpus which documents the use of lexical constructions (e.g. argument constructions of verbs and nouns). FrameBank belongs to FrameNet-oriented resources, but unlike Berkeley FrameNet it focuses more on the morphosyntactic and semantic features of individual lexemes rather than the generalized frames, following the ...

Added: April 11, 2015

Русский язык и новые технологии

М.: Новое литературное обозрение, 2014.

Changes in modern Russian due to the expansion of the new technologies; Russian of the Internet (Runet). Social and cultural consequences of the CMC-revolution. ...

Added: February 3, 2014

Predicting complex syntactic structure in real time Processing of negative sentences in Russian

Kazanina N., The Quarterly Journal of Experimental Psychology 2017 Vol. 70 No. 11 P. 2200–2218

In Russian negative sentences the verb’s direct object may appear either in Accusative case which is licensed by the verb (as is common cross-linguistically) or in Genitive case which is licensed by the negation (Russian-specific ‘Genitive-of-Negation’ phenomenon). Such sentences were used to investigate whether case marking is employed for anticipating syntactic structure, and whether lexical ...

Added: November 28, 2018

From quantitative to semantic analysis: Russian construcitons with dative subjects in diachrony

Bonch-Osmolovskaya A. A., , in: Quantitative approaches to the Russian language. Abingdon: Routledge, 2018. P. 158–174.

The paper presents diachronic study of dative subject constructions with predicatives in Russian. The dataset from corpus of 19-21 century is analysed with clustering method, the classes of predicates which examin similar behaviour are defined. Semantic interpretation is proposed for the observed distribution. ...

Added: July 14, 2017

TEXTS OF DIFFERENT EMOTIONAL CLASSES AND THEIR TOPIC MODELING

Kolmogorova A., Qiuhua S., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Vol. 23 No. 5

The article is devoted to studying verbalization specifics of various emotional states in the texts in Russian with the purpose to confirm or refute the hypothesis that texts of different emotional classes reflect the denotative situation not identically, which is reflected in thematic specifics and lexical content. The research material consisted of eight corpus texts ...

Added: November 29, 2024

The rise of a lingua franca: The case of Russian in Dagestan

Dobrushina N., Kultepina O., International Journal of Bilingualism 2021 Vol. 25 No. 1 P. 338–358

Aims and objectives: In Dagestan, Russian is the language of education, urban way of life, and upward social mobility, and the means of communication between speakers of different languages. This is a result of a quick and drastic change. At the end of the 19th century, Russian was spoken by less than 1% of the population. ...

Added: October 14, 2020

Verb-verb compounds: delimiting the concept and towards the study of ordering principles

Vinyar A., Acta linguistica Petropolitana 2023

This study is devoted to the verb-verb compounds, namely monoclausal complex predicates, in which two or more verbal stems are integrated in a single grammatical word. I critically assess previous approaches to these constructions and their relations to serial verb constructions, a broader family of monoclausal complex predicates. Firstly, I provide a framework in which ...

Added: May 15, 2023

О глубинной семантике и функционально-референциальных особенностях адъективных показателей предшествования во французском и русском языках

Naberezhnova Z. G., Альманах современной науки и образования 2010 № 12 С. 226–231

...

Added: November 23, 2012

"Поверх очков": пространственные интерпретации и семантика предложной конструкции

Lyashevskaya O., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2014 Т. X № 2 С. 332–361

Предлог поверх принадлежит к непервообразным предлогам, которые обладают более простой семантикой, чем многозначные первообразные предлоги. Мы представляем семантическую структуру употреблений предлога в виде радиальной категории, которая связывает между собой различные образные схемы (image schemas). Основанием для выделения классов употреблений является топологический тип фигуры и ориентира, а также функциональные отношения между ними. Необычность категории в том, ...

Added: October 7, 2014

A Reusable Tagset for the Morphologically Rich Language in Change: a Case of Middle Russian

Lyashevskaya O., , in: Computational Linguistics and Intellectual TechnologiesIssue 18. M.: Russian State University for the Humanitie, 2019. P. 422–434.

The paper discusses the standardization efforts to create a morphological standard for the Middle Russian corpus, which is part of the historical collection of the Russian National Corpus (RNC). To meet the needs of different categories of corpus researchers as well as NLP developers, we consider two styles of the morphological annotation (RNC schema and ...

Added: June 12, 2019

«Мигрант» и «миграция» по данным словарей и лингвистических корпусов русского, чешского и немецкого языков

Sibirtseva V., Крылова Л.К., В кн.: Мультикультурализм или интеркультурализм? Опыт Австрии, России, ЕвропыТ. 9. Н. Новгород: Деком, 2013. С. 78–86.

The topic of the article reflects the relationship to the concepts of "migration" and "worker" in Russia, the Czech Republic and in German-speaking countries over the past 30 years. Frequency of use of these words is confirmed by the fact that migration is a very difficult and complex problem to solve. Language is sensitive to ...

Added: October 4, 2013

Стилистически маркированные глаголы в русском языке: совать-сунуть

Rakhilina E. V., Вестник Томского государственного университета 2015

The paper deals with the morphosyntactic and stylistic properties of the Russian verb SUNUT’ and argues for their semantic motivation. SUNUT’ is usually considered as one of “putting verbs” (denoting change of location), but it has some peculiarities in its syntax, derivational patterns, semantics and stylistics. Unlike other verbs of this taxonomic class, SUNUT’ profiles ...

Added: June 2, 2015

Сorpus--based profiles of Russian nouns: from grammatical number to lexical semantics

Lyashevskaya O., / NRU HSE. Series WP BRP "Linguistics". 2015.

A grammatical profile which indicate the relative frequency distribution of the inflected forms of a word in a corpus is a tool for exploring lexical semantics. However the previous attempts to infer semantically relevant hierarchies of nouns from frequency biases within their grammatical forms seem to have failed. In this paper we explore the distinctive ...

Added: April 15, 2015