RuSentEval: Linguistic Source, Encoder Force!

?

RuSentEval: Linguistic Source, Encoder Force!

P. 43-65.

Mikhailov V., Taktasheva E., Сигдел Э. С., Artemova E.

The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages – Russian and English. Our results provide intriguing findings that contradict the common understanding of how linguistic knowledge is represented, and demonstrate that some properties are learned in a similar manner despite the language differences.

Language: English

Full text

Text on another site

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

Association for Computational Linguistics, 2021

Применение квантитативных корпусных методик для выявления церковнославянизмов в современном русском языке

Litvintseva K., Lyashevskaya O., Вестник Православного Свято-Тихоновского гуманитарного университета. Серия 3: Филология 2017 Т. 53 С. 43-55

The starting point of the study is the hypothesis of a discursive proximity of Church Slavonic and Christian religious discourse of the modern Russian language. Analysing lexical structure with quantitative corpus methods we show that the latter is closer to Church Slavonic than the mainstream modern Russian language. This can serve as a proof of ...

Added: September 27, 2017

«Церковнославянизм» как лексикографическая помета

Litvintseva K., Вестник Православного Свято-Тихоновского гуманитарного университета. Серия 3: Филология 2016 № 2 (47) С. 26-44

This paper analyzes the reflection in the dictionaries of the Russian language of Church Slavonic language origin and close semantics vocabulary. And there are two problems: 1. Use any litter compilers of dictionaries indicate Church Slavonic origin of the word or its scope of functioning in the church area? 2. Are the different dictionaries is ...

Added: October 5, 2016

‘Аналитические прилагательные’ в русском языке: являются ли все они прилагательными, и действительно ли они аналитические?

Andrey Gorbov, Russian linguistics 2016 Vol. 40 No. 2 P. 133-152

The article explores the issue of grammatical description of invariable attributive modifiers of nouns in Russian, including initial components in combinations such as бизнес-план, шоу-бизнес,фитнес-зал, in which the premodifiers formally coincide with nouns recently borrowed from English and used as independent words in Russian. The paper challenges the theory according to which all such elements have identical ...

Added: June 9, 2016

“Russian culture” in Central Asia as a Transethnic Phenomenon

Kosmarski A., Kosmarskaya N., , in : Global Russian Cultures. : University of Wisconsin Press, 2019. P. 69-93.

The portrait offered above of everyday life in Almaty is only one of the many expressions of the phenomenon treated in this chapter, which we will refer to as “Russophone cultural-linguistic space,” a term we consider synonymous with the formula “Russian culture in Central Asia” offered in this chapter’s title. This linguistic and cultural space, ...

Added: September 17, 2021

Наиболее употребительные слова повседневной русской речи (в гендерном аспекте и в зависимости от условий коммуникации)

Sherstinova T., В кн. : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва,1–4 июля 2016 г.). Вып. 15.: М. : Изд-во РГГУ, 2016. С. 616-632.

The paper presents the most frequent words of everyday spoken Russian, that form the upper zones of several word frequency lists compiled on the material of Russian speech corpus “One Speaker’s Day” (the ORD corpus), containing real-life recordings of everyday communication. All speech data in the corpus is annotated in terms of communication settings, including ...

Added: October 6, 2018

Global Russian Cultures

Kukulin I., Kosmarski Artyom, Bullock P. R. et al., University of Wisconsin Press, 2019

Is there an essential Russian identity? What happens when "Russian" literature is written in English, by such authors as Gary Shteyngart or Lara Vapnyar? What is the geographic "home" of Russian culture created and shared via the internet? Global Russian Cultures innovatively considers these and many related questions about the literary and cultural life of Russians who ...

Added: September 17, 2021

Developing and Validating an Academic Vocabulary List in Russian: A Computational Approach

Talalakina E., Stukal D., Kamrotov M., Modern Language Journal 2020 Vol. 104 No. 3 P. 618-646

To date, attempts at empirically validating a construct of academic vocabulary in the form of a frequency list in languages other than English remain conspicuously absent in peer‐reviewed journals. This study aims to close this gap by using Russian as a case study to develop an academic vocabulary list and prove its viability through a ...

Added: August 31, 2020

Современные тенденции в дистрибутивном употреблении русских существительных (по корпусным данным)

Kuvshinskaya Y. M., В кн. : Русская грамматика: активные процессы в языке и речи. : Яр. : РИО ЯГПУ, 2019. С. 405-415.

In the paper the modern distributive usage of Russian subject nouns is considered on the basis of corpus data. The author shows that the grammatical rules of choosing the forms of the number of nouns in the distributive usage are not always consistently observed in modern speech. This is due to a number of reasons, ...

Added: December 29, 2019

NB-MLM: Efficient Domain Adaptation of Masked Language Models for Sentiment Analysis

Arefyev N., Kharchev D., Shelmanov A., , in : Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). : Association for Computational Linguistics, 2021. P. 9114-9124.

While Masked Language Models (MLM) are pre-trained on massive datasets, the additional training with the MLM objective on domain or task-specific data before fine-tuning for the final task is known to improve the final performance. This is usually referred to as the domain or task adaptation step. However, unlike the initial pre-training, this step is ...

Added: September 23, 2021

СОВРЕМЕННЫЙ РУССКИЙ ЯЗЫК. СЛОВООБРАЗОВАНИЕ

Panteleeva L., РТО СГПИ (филиал) ПГНИУ, ООО «Типограф», 2018

Настоящее учебное пособие содержит материалы семинарских занятий по курсу «Современный русский язык. Словообразование». В нем представлены задания теоретического и практического характера по изучению основных тематических блоков русской морфемики и словообразования. Помимо материалов семинарских занятий, в пособие включены планы морфемного, словообразовательного и этимологического анализов, список вопросов для самоконтроля и список рекомендуемой литературы. Использование пособия не ограничивается рамками ...

Added: October 31, 2021

СОВРЕМЕННЫЙ РУССКИЙ ЯЗЫК. МОРФОЛОГИЯ

Panteleeva L., РТО СГПИ (филиал) ПГНИУ, ООО «Типограф», 2018

Настоящее учебное издание создано в соответствии с содержанием учебной программы по курсу «Современный русский язык. Морфология» для бакалавров. Вместе с тем эти материалы могут быть использованы студентами-филологами или студентами других факультетов при изучении таких дисциплин, как «Введение в языкознание», «Русский язык», «Русский язык с основами языкознания». Универсальный характер издания с учетом целевой аудитории находит отражение в ...

Added: October 31, 2021

Русский язык в России и за рубежом: изучение активных процессов в языке и речи

Н. Новгород : Национальный исследовательский Нижегородский государственный университет им. Н.И. Лобачевского, 2021

В сборнике представлены статьи, подготовленные на основе материалов Международной научной конференции «Русский язык в России и за рубежом: изучение активных процессов в языке и речи» в рамках филологической сессии «Национальные коды в языке и литературе» (Национальный исследовательский Нижегородский государственный университет им. Н.И. Лобачевского, Институт филологии и журналистики, 29–31 октября 2021 г.). В статьях ведущих российских ...

Added: February 11, 2022

The Dative Radial Category in Old Church Slavonic and Modern Russian

Voloshina E., Poljarnyj Vestnik 2021 Vol. 24 P. 13-32

In this paper, the semantic roles expressed by the Dative case in Modern Russian and Old Church Slavonic are described in terms of radial categories. The corpus data shows that the radial category of the Dative case has changed since Old Church Slavonic. The radial category in Modern Russian is smaller, and it includes fewer ...

Added: December 6, 2021

Artificial Text Detection via Examining the Topology of Attention Maps

Kushnareva L., Cherniavskii D., Mikhailov V. et al., , in : Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). : Association for Computational Linguistics, 2021. P. 635-649.

Added: September 27, 2021

Наречие в функции распространителя адъективированных причастий в современном русском языке

Kosheleva D., Lyashevskaya O., В кн. : Гуманитарное образование и наука в техническом вузе. : Ижевский государственный технический университет им. М.Т. Калашникова, 2017.

The article is focused on the co-occurrence of different types of adverbs with participles of varying degrees of adjectivation in the modern Russian language. Examples of the use of adverbs and participial forms are given. Conclusions are drawn about the role of adverbs in the process of adjectivization. ...

Added: September 27, 2017

Структура повседневного диалога как последовательность речевых актов (The structure of everyday dialogue as the sequence of speech acts)

Sherstinova T., В кн. : Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 30 мая — 2 июня 2018 г.). Вып. 17(24).: М. : Издательский центр «Российский государственный гуманитарный университет», 2018. С. 637-651.

The structure of Russian everyday dialogue was studied on the basis of 73 microdialogues of everyday speech communication from the ʽOne Day of Speechʼ corpus (the ORD Corpus). The aim of the research was to find out what types of speech acts commonly initiate and complete everyday dialogues, as well as to reveal the most ...

Added: October 5, 2018

Русский язык: исторические судьбы и современность: VI Международный конгресс исследователей русского языка (Москва, филологический факультет МГУ имени М. В. Ломоносова, 20-23 марта 2019 г.): Труды и материалы

Издательство Московского университета, 2019

Материалы VI Международного конгресса исследователей русского языка "Русский язык: исторические судьбы и современность" (Москва, филологический факультет МГУ имени М. В. Ломоносова, 20-23 марта 2019 г.) ...

Added: October 5, 2020

Исследование «слабого» грамматического ограничения методами экспериментального синтаксиса: пример придаточных с союзом что в функции сентенциального актанта существительного

Knyazev M., Рема 2017 № 1 С. 22-40

Weak grammatical violations, i.e. violations that result in intermediate unacceptability, pose challenges for a formal description of grammar since it remains undecided whether they are the result of a true grammatical constraint or merely an epiphenomenal consequence of processing complexity. Neither informal grammaticality judgements nor corpus data alone can reveal finegrained distinctions that may be ...

Added: October 19, 2017

Russian indefinite pronoun kakoj-libo: non-standart usage and changes in the semantics

Kuvshinskaya Y. M., Jazykovedny Casopis 2019 No. 2 P. 225-233

The paper deals with meaning and use of an indefinite pronoun kakojlibo ‘any/some’ in the modern Russian language. Research based on corpus data revealed non-standard usage of the pronoun kakoj-libo ‘any/some’. The paper describes main types of the deviations and evaluates their pragmatic and semantic effect. Finally, tendencies of the change in semantics and use of these pronouns are ...

Added: December 29, 2019

Конструкция ‘стал быть’ в русском языке

Leontieva A., Litvintseva K., Русский язык в научном освещении 2018 № 35(1) С. 110-132

This article considers the Russian marginal copula construction stat’ +byt’ (become to be) + nonverbal predicate and its normative variant without the copula byt’. These constructions were used in the 18th century language where they both were thought as normative, and they are still used today in web communications although possibly without a diachronic succession. ...

Added: November 29, 2017

«Церковнославянизм» как лингвистический термин

Litvintseva K., Вестник Орловского государственного университета. Серия: Новые гуманитарные исследования 2015 № 6 (47) С. 264-267

The article examines the use of linguists terms Slavonicism, staroslavyanizm (word derived from Old Slavonic), tserkovnoslavyanizm (word derived from Church Slavonic) and related to them. The focus is on the term «tserkovnoslavyanizm», as it is particularly important in the study of Christian discourse of the modern Russian language. The author offers his definition of «tserkovnoslavyanizm» ...

Added: October 5, 2016

Фонетика современного русского литературного языка (Фонетика. Фонология. Орфоэпия. Графика. Орфография)

Grishchenko A., Попова М. Т., М. : МПГУ, 2018

В учебном пособии собраны дидактические материалы, необходимые для практического освоения курса «Фонетика. Фонология. Орфоэпия. Графика. Орфография» в рамках дисциплины «Современный русский язык»: это и некоторые общетеоретические положения, обойдённые вниманием в имеющихся современных учебниках, и необходимые студенту таблицы и схема, и, наконец, собственно 150 вопросов и упражнений, а также список теоретических вопросов к экзамену или зачёта ...

Added: October 21, 2020