Quantitative approaches to the Russian language

doi:10.4324/9781315105048

Publications

?

Quantitative approaches to the Russian language

Abingdon : Routledge, 2018.

Editor-in-chief: Lyashevskaya O., Kopotev M., Mustajoki A.

Compiler: M. Kopotev

This edited collection presents a range of methods that can be used to analyse linguistic data quantitatively. A series of case studies of Russian data spanning different aspects of modern linguistics serve as the basis for a discussion of methodological and theoretical issues in linguistic data analysis. The book presents current trends in quantitative linguistics, evaluates methods and presents the advantages and disadvantages of each. The chapters contain introductions to the methods and relevant references for further reading.

The Russian language, despite being one of the most studied in the world, until recently has been little explored quantitatively. After a burst of research activity in the years 1960-1980, quantitative studies of Russian vanished. They are now reappearing in an entirely different context. Today we have large and deeply annotated corpora available for extended quantitative research, such as the Russian National Corpus, ruWac, RuTenTen, to name just a few (websites for these and other resources will be found in a special section in the References). The present volume is intended to fill the lacuna between the available data and the methods that can be applied to studying them.

Our goal is to present current trends in researching Russian quantitative linguistics, to evaluate the research methods vis-à-vis Russian data, and to show both the advantages and the disadvantages of the methods. We especially encouraged our authors to focus on evaluating statistical methods and new models of analysis. New findings concern applicability, evaluation, and the challenges that arise from using quantitative approaches to Russian data.

Automated Word Sense Frequency Estimation for Russian Nouns

Lopukhina A., Лопухин К. А., Носырев Г. В., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 79–94.

According to G. K. Zipf’s observation, there is a strong correlation between word frequency and polysemy. Yet word sense frequency distribution is a neglected area in computational linguistics. Furthermore, the study of sense frequency has theoretical interest and practical applications for lexicography and word sense disambiguation. Although WordNet and SemCor contain some information about sense frequency ...

Added: October 11, 2016

From quantitative to semantic analysis: Russian construcitons with dative subjects in diachrony

Bonch-Osmolovskaya A. A., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 158–174.

The paper presents diachronic study of dative subject constructions with predicatives in Russian. The dataset from corpus of 19-21 century is analysed with clustering method, the classes of predicates which examin similar behaviour are defined. Semantic interpretation is proposed for the observed distribution. ...

Added: July 14, 2017

The grammatical profiles of Russian biaspectual verbs

Piperski A., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. Ch. 6 P. 115–136.

Russian has a relatively large group of biaspectual verbs, which can be used to convey both perfective and imperfective meaning. However, some of these verbs are used more often in perfective contexts and others in imperfective contexts, which is likely to influence the direction of the further development of overt aspectual oppositions in these verbs ...

Added: October 19, 2017

Looking for contextual cues to differentiating modal meanings: A corpus-based study

Lyashevskaya O., Ovsjannikova M., Szymor N. et al., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 51–78.

The domain of modality is structurally diverse and may be described in multiple ways (for example, see Perkins, 1983; Wierzbicka, 1987; Hengeveld, 1988/2004; Sweetser, 1990; Bondarko, 1990; Bybee et al., 1994; van der Auwera and Plungian, 1998; Palmer, 2001; Hansen, 2004; Nuyts, 2006; Khrakovsky, 2007). The article reports on the Russian part of a larger survey ...

Added: October 24, 2017

Russian challenges for quantitative research

Kopotev M., Lyashevskaya O., Mustajoki A., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 3–29.

Added: October 24, 2017

From quantitative to semantic analysis: Russian constructions with dative subjects in diachrony

Anastasia Bonch-Osmolovskaya, , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. Ch. 8 P. 158–174.

The chapter demonstrates how quantitative corpus methods used in linguistics research may help to rank different realizations of the same phenomena: the use of dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be ...

Added: March 15, 2018

Evaluation of collocation extraction methods for the Russian language

Pivovarorva L., Kormacheva D., Kopotev M., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 137–157.

This paper focuses on empirical collocations, understood here as word co-occurrences that 1) are frequent enough to be extracted automatically and 2) may be semantically and/or syntactically bounded to various extents. Our main goal is to examine closely five window-based methods for empirical collocation extractions that are widely used in corpus-based studies, sometimes without proven ...

Added: September 30, 2020

Russian challenges for Quantitative research

Kopotev M., Lyashevskaya O., Mustajoki A., , in: Quantitative approaches to the Russian language.: Abingdon: Routledge, 2018. P. 3–29.

Abstract: The Introductory chapter presents current trends in researching the Russian language quantitatively. It starts with a short description of main features of the Russian Grammar to help the reader follow this book without deep knowledge of the language. The main part overviews the quantitative studies in Russian conducted in 2000-2010s. We first address the ...

Added: September 30, 2020

Research target: Philology and Linguistics Mathematics Computer Science

Priority areas: humanitarian

Language: English

DOI

Text on another site

Keywords: русский язык количественный анализ Russian language русская лингвистическая наука quantitative data analysis Russian linguistics

Quantitative approaches to the Russian language

Universal Dependencies for Russian: A New Syntactic Dependencies Tagset

Lyashevskaya O., Droganova K., Zeman D. et al., / NRU HSE. Series WP BRP "Linguistics". 2016. No. 44.

This paper presents the Universal Dependencies tagset (UD v1) as a new annotation scheme for Russian treebanks. The universal list of dependency relations was adopted and extended to comply with certain language-specific syntactic constructions. The tagset was validated, converting two Russian treebanks into the UD format, UD-Russian-SynTagRus and UD-Russian-Google. ...

Added: December 14, 2016

Welcome to the club: Designing the inventory of semantic roles for adjectives

Lyashevskaya O., Kashkin E., Компьютерная лингвистика и интеллектуальные технологии 2016 No. 15 P. 440–454

The argument constructions of adjectives has largely been out of the scope of research on semantic roles both in theoretical and IT fields. Before adding the roles of adjectival arguments to the network of semantic roles it is important to determine whether the adjectival roles form a separate list or whether they can be seen ...

Added: December 14, 2016

Towards to Automatic Text Adaptation in Russian Language

Karpov N., Sibirtseva V., / NRU HSE. Series WP BRP "Linguistics". 2014.

This article describes ways to use original texts in the National Russian Corpus as well as news texts for teaching Russian as a foreign language. Two-year work of a scientific group of Higher School of Economics (Nizhny Novgorod-Moscow), which is called CorpLings is analyzed. Special attention is paid to the basic principles of research part of the project ...

Added: December 10, 2014

Inducing verb classes from frames in Russian: morpho-syntax and semantic roles

Кашкин Е. В., Компьютерная лингвистика и интеллектуальные технологии 2015 Vol. 21 P. 427–440

The paper presents clustering experiments on Russian verbs based on the statistical data drawn from the Russian FrameBank (framebank.ru). While lexicology has essentially abandoned the idea of syntactic transformations as the primary basis for grouping verbs into semantic classes (Apresjan 1967, Levin 1993), the hypothesis of the same lexical and syntactic distributional profiles underlying lexical ...

Added: September 30, 2015

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (2019)

M.: Russian State University for the Humanitie, 2019.

The book includes 64 papers submitted to the International conference in computer linguistics and intellectual technologies Dialogue 2019 and presents a broad spectrum of theoretical and applied research of natural language description, language simulation, and creation of applied computer technologies. ...

Added: October 16, 2019

Квантитативные методы в диахронических корпусных исследованиях: конструкции с предикативами и дативным субъектом

Bonch-Osmolovskaya A. A., Компьютерная лингвистика и интеллектуальные технологии 2015 Т. 1 № 14(21) С. 80–95

The paper proposes new approaches to the problem of Russian dative subjects in predicative and adjective constructions. The core idea of the research is to study the distribution of dative subject constructions with predicative and adjective forms that potentially can be used in such constructions. The methodological novelty of the approach is manifested in the ...

Added: April 15, 2015

Обобщения, ориентированные на исходную точку деривации vs. на продукт деривации, в описании синтаксических процессов

Letuchiy A., Acta Linguistica Petropolitana. Труды института лингвистических исследований 2014 Т. 10 № 2 С. 292–330

In the article, the relevance of Bybee's opposition of product-based generalizations vs. source-based generalizations for syntax is argued. Corpus data of Russian are used. ...

Added: October 6, 2014

Электронная база вариативных явлений

Dobrushina N., Стаферова Д. А., Белоконь А. А., Slovĕne 2018 № 1 С. 424–436

The paper is an overview of the Repository of Variationist Research (https://vastry.ru/), an online storage and interactive plotting tool for quantitative sociolinguistic data. The paper describes a number of sociolinguistic experiments from which the data come and outlines the Repository and the toolkit it provides to its users. ...

Added: August 18, 2018

Historical development of labile verbs in modern Russian

Letuchiy A., Linguistics 2015 Vol. 53 No. 3 P. 611–647

The article deals with the phenomenon of lability (ambitransitivity), in other words, the ability of a verb to be either transitive or intransitive. I analyze the historical development of verbs which are currently labile in modern Russian. The main group of Russian labile verbs contains verbs of motion. On the basis of corpus and dictionary ...

Added: February 8, 2015

Материалы к корпусной грамматике русского языка

СПб.: Издательство Нестор-История, 2018.

The volume is the third issue of a corpora-based grammar of Russian. The volume deals with the issues of parts of speech and, more generally, with formal classes of lexicon, It comprises descriptive papers of separate POS and lesser world classes. ...

Added: November 4, 2018

Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты

Радбиль Т. Б., Рацибурская Л. В., Щеникова Е. В. et al., М.: Флинта, 2021.

В коллективной монографии «Русский язык в интернет-коммуникации: лингвокогнитивный и прагматический аспекты» анализируется современное состояние русского языка в Интернете, дается лингвокультурологическая интерпретация активных процессов в интернет-коммуникации через призму лингвокогнитивного и лингвопрагматического подходов ...

Added: April 13, 2021

Проблемы обработки естественного языка в диалоговых системах

Klyshinskiy E., Жеребцова Ю., Чижик А., Системный администратор 2019 № 10 С. 82–91

Nowadays, a field of dialogue systems and conversational agents is one of the rapidly growing research areas in artificial intelligence applications. Business and industry are showing increasing interest in implementing intelligent conversational agents into their products. Many recent studies has tended to focus on possibility of developing task-oriented systems which are able to have long ...

Added: October 26, 2019

О псевдоидентификации в русском языке (на примере обозначений человека в русских литературных текстах)

Botchkarev A., Вестник Новосибирского государственного университета. Серия: Лингвистика и межкультурная коммуникация 2020 Т. 18 № 2 С. 5–12

There is no way to identify an animate object other than to describe its specific characteristics which necessarily look like deviations from the normal “average” pattern, named here paragon, in which the Axiological Standard of a human group is fixed. Of particular heuristic interest is, in this regard, the logical pattern, often used in Russian for describing such ...

Added: July 28, 2020

Russian verboids: A case study in expressive vocabulary

Nikitina T., Linguistics 2012 Vol. 50 No. 2 P. 165–189

Like other Balto-Slavic languages, Russian makes extensive use of verboids, a class of deverbal formations lacking inflectional and derivational markers. The use of verboids is characteristic of oral narration and is typically accompanied by gesture. This article surveys the properties of verboids based on data from the Russian National corpus. It shows that verboids bear ...

Added: April 23, 2013

L’infinitif et la forme finie des verbes dans les subordonnées de but en russe et en français

Letuchiy A., Nikishina E. A., Journal of French Language Studies 2021

The article describes the distribution of verb forms in Russian and French purpose constructions. In the default co-reference type, the PRO of infinitive refers to the subject of the matrix clause in both languages, but the direct object can also control the PRO, both in Russian and French. This happens when the object of the ...

Added: November 1, 2019

Извлечение однословных терминов из текстовых коллекций на основе методов машинного обучения

Большакова Е.И., Лукашевич Н.В., Нокель М.А., Информационные технологии 2013 № 7 С. 31–37

В статье представлены результаты экспериментов по автоматическому извлечению однословных терминов из русскоязычных текстов на основе машинного обучения, позволяющего комбинировать применяемые статистические и лингвистические признаки терминов. Эксперименты показывают, что комбинирование значительно улучшает результаты извлечения терминов, а найденная комбинация признаков может быть использована на расширенной текстовой коллекции без значительной потери качества. ...

Added: October 1, 2014

Автоматическое определение частей речи для русского языка с помощью обучения трансформаций.

Kitov V. V., Научные труды Вольного экономического общества России 2014 Т. 186 С. 228–235

This paper describes the application of well-known «transformation-based learning» algorithm of automatic rule generation for the task of part-of-speech tagging. Algorithm is applied to corpora of annotated Russian texts and accuracy as well as most significant rules are shown. ...

Added: March 16, 2016

Лики билингвизма

СПб.: Златоуст, 2016.

This book is a collection of papers written by Russian and foreign linguists to highlight the different aspects of bilingualism. Much attention is paid to the early simultaneous and successive bilingualism in children; however, adults speaking several languages in natural settings as well as in classroom are also considered. Some chapters are concentrated on language attrition — an ...

Added: October 2, 2016

Корпусные инструменты в грамматических исследованиях русского языка

Lyashevskaya O., М.: Языки славянской культуры, 2016.

Corpus linguistics can be broadly defined in terms of two partially overlapping research dimensions . On the one hand, corpus linguistics is knowledge of how to compile and annotate linguistic corpora. On the other hand, corpus linguistics is a family of qualitative and quantitative methods of language study based on corpus data. The book presents ...

Added: March 26, 2015

Закон о государственном языке Российской Федерации: история обсуждения и поправок

Krongauz M., Zeitschrift für Slavische Philologie 2016 Т. 72 № 2 С. 255–269

The law “On the state language of the Russian Federation,” adopted in 2005, became a reflection of debates about the Russian language at the beginning of the twenty-first century and caused frustration in both conservative and liberal segments of society. On multiple occasions, attempts were made to amend the law or enact additional laws aimed ...

Added: October 22, 2017

Предикативное согласование со словами ряд, половина, часть, множество в современном русском языке

Kuvshinskaya Y. M., Сибирский филологический журнал 2019 № 2 С. 189–215

The work deals with the strategies for predicate agreement to quantified noun groups headed by nouns. In Russian, as in other Slavic languages, predicate agreement with quantified noun phrases allows singular or plural forms of the predicate. As for the sentences with quantifiers-nouns r’ad, polovina, chast’, mnozestvo, three agreement strategy are probable: predicate agrees with ...

Added: September 8, 2019

Понятие «банкротство» в координатах правовой лингвистики: русско-англо-французские аппроксимации

Vlasenko S. V., Галимов А. Р., Вестник Тверского государственного университета. Серия: Филология 2012 Т. 10 № 2 С. 21–28

«Bankruptcy» Concept Within the Legal Linguistics Coordinates: Russian–English–French Approximations The article addresses the notion of bankruptcy as perceived by speakers of current Russian, English and French languages both lawyers and participants in professional communication from other trades. Semantic structure of the term is identified based on its lexicographic and regulatory definitions. ...

Added: October 4, 2012

Изменения согласных по месту и способу образования на стыках слов в некоторых двухфонемных сочетаниях

Duryagin P., Известия Юго-Западного государственного университета. Серия: Лингвистика и педагогика 2015 № 2 С. 78–88

The paper contains the results of phonetic experiment concerning the changes of place and manner of articulation in four biphonemic consonant clusters in Modern Standard Russian. The rules for assimilation and coarticulation of consonants are reviewed in the article, application of these rules in internal and external sandhi positions is compared. ...

Added: October 15, 2016

Investigaciones comparadas ruso-españolas: aspectos teoricos y metodologicos

Granada: Jizo Ediciones, 2011.

Материалы конференции содержат полные тексты докладов по темам «Русско-испанские сопоставительные исследования», «Образ России и Испании в литературе, истории и культурологии», «Русский испанский языки в теории и практике перевода». ...

Added: February 26, 2013