RuBQ 2.0: An Innovated Russian Question Answering Dataset

Rybin I.; Korablinov V.; Efimov P.; P. Braslavski

doi:10.1007/978-3-030-77385-4_32

Publications

?

RuBQ 2.0: An Innovated Russian Question Answering Dataset

P. 532–547.

Rybin I., Korablinov V., Efimov P., Braslavski P.

The paper describes the second version of RuBQ, a Russian dataset for knowledge base question answering (KBQA) over Wikidata. Whereas the first version builds on Q&A pairs harvested online, the extension is based on questions obtained through search engine query suggestion services. The questions underwent crowdsourced and in-house annotation in a quite different fashion compared to the first edition. The dataset doubled in size: RuBQ 2.0 contains 2,910 questions along with the answers and SPARQL queries. The dataset also incorporates answer-bearing paragraphs from Wikipedia for the majority of questions. The dataset is suitable for the evaluation of KBQA, machine reading comprehension (MRC), hybrid questions answering, as well as semantic parsing. We provide the analysis of the dataset and report several KBQA and MRC baseline results. The dataset is freely available under the CC-BY-4.0 license.

Language: English

DOI

Keywords: Knowledge base question answering

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings

Springer, 2021.

RuBQ: A Russian Dataset for Question Answering over Wikidata

Korablinov V., Braslavski P., , in: The Semantic Web – ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, ProceedingsVol. 2.: Springer, 2020. P. 97–110.

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection ...

Added: December 8, 2020