• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • CLEVR-BT-DB: a benchmark dataset to evaluate the reasoning abilities of deep neural models in visual question answering problems
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 5, 2026
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
June 5, 2026
‘In the Age of Technology, It Is Interesting to Look into the Past and Think about What We Can Take from It
Polina Tabakova decided to apply for a Philology degree at HSE in Nizhny Novgorod because she grew up in Mari El and did not want to move far away from the Russian forests. In an interview for the Young Scientists of HSE University project, she spoke about the genre of the campus novel, the existential drama of Kolobok, and a blackout version of Eugene Onegin.
June 5, 2026
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

CLEVR-BT-DB: a benchmark dataset to evaluate the reasoning abilities of deep neural models in visual question answering problems

Ch. 1316909.
Latipov I., Andrey Borevskiy, Kertesz-Farkas A.

Deep learning-based machine reasoning and visual question answering models achieve a near-human performance on their respective datasets; however, their performance dramatically drops under domain shift suggesting that models fail to generalize to the level of human-like reasoning. In this paper we present a new CLEVR-like dataset consisting of images-question pairs to evaluate the visual reasoning capability of deep models. The objects in the images are arranged in a way that the first half of the question is ambiguous and multiple answers seem to be correct up to this point; however, the second half of the question clarifies the situation and makes the whole visual question-answering (VQA) task unambiguous, and a unique answer can be reported. Therefore, deep models during their reasoning process need to handle ambiguousness in their neurons. They can handle this either via graph (or tree) traversing in the search space with using back-tracking technique or via refining a candidate set of possibly correct answers by iteratively eliminating incorrect ones upon some reasoning calculations. We call this data-set CLEVR with Back-Tracking Database, CLEVR-BT-DB. It consists of 2,500 images and 10,000 questions in the same format as the standard CLEVR, and it is available at https://huggingface.co/datasets/Aborevsky01/CLEVR-BT-DB site. The code to generate additional data is available at https://github.com/AFigaro/CLEVR_BT_DB site. We tested MDETR method, a recent deep model for VQA from Meta Research, it achieved an accuracy of 99.7 % on the Standard CLEVR dataset; however, it achieves an accuracy of 28.01 % on our CLEVR-BT-DB dataset.

Language: English
Full text
DOI
Text on another site
Keywords: Visual question answeringmachine reasoning
Publication based on the results of:
Robust and accurate analysis of the data modalities in mass spectrometry (2024)

In book

Proceedings Volume 13169. Fifth International Conference On Computer Vision And Computational Intelligence (CVCI 2024) 29-31 January 2024, Bangkok, Thailand
SPIE, 2024.
Similar publications
RuCLEVR: A Russian Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Biryukova K., Chelnokova D., Erkenova J. et al., Communications in Computer and Information Science 2024 Vol. 2364 CCIS P. 109 – 121
Added: February 25, 2026
Analyzing the Robustness of Vision & Language Models
Shirnin A., Andreev N., Potapova S. et al., IEEE/ACM Transactions on Speech and Language Processing 2024 Vol. 32 P. 2751–2763
We present an approach to evaluate the robustness of pre-trained vision and language (V&L) models to noise in input data. Given a source image/text, we perturb it using standard computer vision (CV) / natural language processing (NLP) techniques and feed it to a V&L model. To track performance changes, we explore the problem of visual ...
Added: July 19, 2024
Error Analysis for Visual Question Answering
Podtikhov A., Shaban M., Kovalev A. et al., , in: Advances in Neural Computation, Machine Learning, and Cognitive Research IV. Selected Papers from the XXII International Conference on Neuroinformatics 2020. Studies in Computational Intelligence.Vol. 925.: Springer, 2021. P. 283–292.
In recent years, the task of visual question answering (VQA) at the intersection of computer vision and natural language processing is gaining interest in the scientific community. Even though modern systems achieve good results on standard datasets, these results are far from what is achieved in Computer Vision or Natural Language Processing separately, for example, ...
Added: October 30, 2020
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit