RuBQ: A Russian Dataset for Question Answering over Wikidata

Korablinov V.; P. Braslavski

doi:10.1007/978-3-030-62466-8_7

Publications

?

RuBQ: A Russian Dataset for Question Answering over Wikidata

P. 97–110.

Korablinov V., Braslavski P.

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.

The freely available dataset will be of interest for a wide community of researchers and practitioners in the areas of Semantic Web, NLP, and IR, especially for those working on multilingual question answering. The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects.

Language: English

DOI

Keywords: Knowledge base question answering

Publication based on the results of:

Development of Mathematical Models and Methods for Recommender Systems and Natural Language Processing (2020)

In book

The Semantic Web – ISWC 2020: 19th International Semantic Web Conference, Athens, Greece, November 2–6, 2020, Proceedings

Vol. 2. , Springer, 2020.

RuBQ 2.0: An Innovated Russian Question Answering Dataset

Rybin I., Korablinov V., Efimov P. et al., , in: The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings.: Springer, 2021. P. 532–547.

The paper describes the second version of RuBQ, a Russian dataset for knowledge base question answering (KBQA) over Wikidata. Whereas the first version builds on Q&A pairs harvested online, the extension is based on questions obtained through search engine query suggestion services. The questions underwent crowdsourced and in-house annotation in a quite different fashion compared ...

Added: July 23, 2021