SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

Efimov P.; Chertok A.; Leonid B.; P. Braslavski

doi:10.1007/978-3-030-58219-7_1

Publications

?

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

P. 3–15.

Efimov P., Chertok A., Leonid B., Braslavski P.

The paper presents SberQuAD – a large Russian reading comprehension (RC) dataset created similarly to English SQuAD. SberQuAD contains about 50K question-paragraph-answer triples and is seven times larger compared to the next competitor. We provide its description, thorough analysis, and baseline experimental results. We scrutinized various aspects of the dataset that can have impact on the task performance: question/paragraph similarity, misspellings in questions, answer structure, and question types. We applied five popular RC models to SberQuAD and analyzed their performance. We believe our work makes an important contribution to research in multilingual question answering.

Language: English

DOI

Keywords: evaluation Reading comprehension Russian language resources multilingual question answering

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

Experimental IR Meets Multilinguality, Multimodality, and Interaction

Springer, 2020.

ФОРМИРОВАНИЕ ПЛЮРОЛИНГВАЛЬНОЙ КОМПЕТЕНЦИИ В СТАРШЕЙ ШКОЛЕ (НА МАТЕРИАЛЕ ТЕКСТОВ ЕГЭ, ЗАДАНИЕ № 12)

Smirnova A., Гордиенко Т. А., В кн.: ЧЕЛОВЕК И ТЕКСТ Материалы VIII Международной лингвокультурологической конференции. Ульяновск, 2026.: Ульяновск: Ульяновский государственный университет, Ульяновск, 2026. С. 185–191.

The article sets out to examine the plurilingual potential of reading texts (Task 12) of the Russian Unified State Examination (USE) in English. Based on thematic analysis of 47 texts from the open task bank, the authors investigate the extent to which the content of the texts fosters the development of plurilingual competence. The findings ...

Added: June 29, 2026

Социальное значение в текстах почтовых открыток (на материале корпуса почтовой переписки «Пишу тебе»)

Куликова В. А., Вестник Томского государственного университета. Филология 2026 № 100 С. 53–73

На материале цифрового корпуса почтовых открыток «Пишу тебе» исследуются речевые особенности выражения социального значения в межличностной письменной коммуникации. Использованы инструменты корпусной аналитики в сочетании с методами лексико-семантического, контекстуального, прагматического анализа. В результате исследования охарактеризованы эксплицитные формы выражения социального значения, оценочность контекстов с социальным значением, а также выявлены имплицитные проявления социального значения в этикетных формулах, частотой ...

Added: November 2, 2025

RePlay: a Recommendation Framework for Experimentation and Production Use

Vasilev A., Anna Volodkevich, Kulandin D. et al., , in: RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems.: Association for Computing Machinery (ACM), 2024. P. 1191–1194.

Added: November 24, 2024

Revisiting the performance evaluation of knowledge-aware recommender systems: are we making progress?

Ananyeva M., Lashinin O., Kuznetsova M., , in: Proceedings of the Fourth Knowledge-aware and Conversational Recommender Systems Workshop co-located with 16th ACM Conference on Recommender Systems (RecSys 2022)Vol. 3294.: CEUR Workshop Proceedings, 2022. P. 22–28.

Knowledge-aware recommender systems incorporate side information to improve recommendation performance. The authors of new algorithms are usually focused on developing new ideas behind the proposed methods and comparing their models with existing knowledge-aware recommender models. Meanwhile, some commonly used state-of-the-art general top-n recommender models are ignored as potential baselines. In this study, we compare previously ...

Added: January 5, 2024

Экспрессивно-оценочный потенциал словообразовательных неологизмов в современном медиатексте

Торопкина В.А., В кн.: Научное наследие Б.Н. Головина в свете актуальных проблем современного языкознания (к 100-летию со дня рождения Б.Н. Головина): Сборник статей по материалам Международной научной конференции.: Н. Новгород: Деком, 2016. С. 370–374.

The article discusses the concept of evaluation and expressivity in modern media texts, characterizes the specificity of derivational neologisms as a means of formation of this linguistic categories. The specific features of expression of expressive and evaluative semantics of different types of neologisms within the media and political discourse are analyzed. ...

Added: September 30, 2020

Conclusion — What Have We Learnt, and Where Do We Go from Here?

Kriesi H., Morlino L., , in: How Europeans View and Evaluate Democracy.: Oxford: Oxford University Press, 2016. Ch. 14 P. 307–326.

This chapter summarizes the detailed empirical results of the volume. We have found very strong evidence that the basic principles of liberal democracy are universally endorsed across Europe, and that Europeans also embrace direct and social democracy. The legitimacy in terms of liberal and direct democracy is high across Europe, the real democratic deficit in ...

Added: October 22, 2018

Reproducing Network Structure: A Comparative Study of Random Graph Generators

Drobyshevskiy M., Turdakov Denis, Kuznetsov Sergey, , in: Proceedings of the Ivannikov ISPRAS Open Conference, 30 November – 1 December 2017, Moscow, Russian Federation.: Los Alamitos: IEEE Computer Society, 2017. P. 83–89.

The problem of generating graphs similar to a given one arises in such tasks as data anonymization and significance testing of network mining tools. Main challenges lie in a rich diversity of graph domains emerging in various research areas and the uncertainty about graph properties to be reproduced. Our central statement is that a good ...

Added: February 4, 2018