Evaluation of Sentence Embedding Models for Natural Language Understanding Problems in Russian

D. Popov; Pugachev A.; Svyatokum P.; Svitanko E.; E. Artemova

doi:10.1007/978-3-030-37334-4_19

Publications

?

Evaluation of Sentence Embedding Models for Natural Language Understanding Problems in Russian

P. 205–217.

Popov D., Pugachev A., Svyatokum P., Svitanko E., Artemova E.

We investigate the performance of sentence embeddings models on several tasks for the Russian language. In our comparison, we include such tasks as multiple choice question answering, next sentence prediction, and paraphrase identification. We employ FastText embeddings as a baseline and compare it to ELMo and BERT embeddings. We conduct two series of experiments, using both unsupervised (i.e., based on similarity measure only) and supervised approaches for the tasks. Finally, we present datasets for multiple choice question answering and next sentence prediction in Russian.

Keywords: парафраз multiple choice question answering next sentence prediction sentence embedding paraphrase identification модель эмбеддингов предложения

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2019)

In book

Analysis of Images, Social Networks and Texts. 8th International Conference AIST 2019

Springer, 2019.

Обучение стратегиям парафразирования при работе с учебными текстами на занятиях английского языка.

Кочеткова Л. Ю., Евразийский гуманитарный журнал 2020 № 4 С. 76–81

Важным умением студентов, изучающих иностранный язык, является умение парафразировать учебный текст. Это умение позволяет учащимся избегать дословного воспроизведения авторских предложений и проявлять индивидуальность речи и мышления через применение известных им лексических единиц и грамматических конструкций. Техники парафраза идеально подходят для тренировки языковых навыков и развития устной речи, делая последнюю более выразительной. Однако у большинства студентов-бакалавров ...

Added: November 9, 2022

Функция парафразиса романа Ф.Достоевского "Бесы" в повести Г.Сапгира "Армагеддон"

Pavlovets M., В кн.: Сусрети народа и култура. Меħдународни тематски зборник. [б.и.], 2015. С. 147–164.

This article analyses the later G. Sapgir’s work – his novel “Armageddon”. Apocalyptic problematic of work is polemically sharpened against the eschatology "historical avant-garde" of the first third of the 20th century and the eschatological expectations "nuclear war" of the late Soviet time. It rather takes the form of reflection on "finalistic" culture concept, which ...

Added: August 24, 2015

Переписать или пересказать? О пересказе и вариации в рукописной традиции Bjarnar saga Hítdælakappa

Daria G., Вестник РГГУ. Серия «Литературоведение. Языкознание. Культурология» 2020 № 4 С. 28–44

В статье рассматривается проблема вариации в рукописной традиции «Саги о Бьёрне, герое из долины Хит» (Bjarnar saga Hítdælakappa). Обычно две основные версии саги рассматривали как сокращенную и пространную редакции. Однако, их сравнение показывает, что, при большом объёме идентичного текста, большинство разночтений являются равноправной вариацией, а тенденции к сужению одной из версий не наблюдается. Это позволяет ...

Added: April 23, 2020

Пересказ как искусство историка: к вопросу о рукописной трансмиссии историописания в Древней Руси и Древней Скандинавии

Daria G., Славяноведение 2020 № 4 С. 30–49

До 1016 г. версии «Повести временных лет» (ПВЛ) по ее основным киевским спискам и по Новгородской первой летописи младшего извода (Н1Лмл.) обычно соотносятся либо как близкие копии, либо как расширенная/сжатая редакции одного текста. Это правило нарушается лишь в рассказе о конфликте Ярослава и новгородцев под 1015–1016 гг. Здесь ПВЛ и Н1Лмл. соотносятся скорее как пересказы ...

Added: September 28, 2020

A new Russian paraphrase corpus. Paraphrase identification and classification based on different prediction models

Pronoza E., Yagunova E., , in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016Vol. 1. Issue 9623. Springer Publishing Company, 2018. P. 573–587.

Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the ...

Added: October 30, 2020