• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • TAPE: Assessing Few-shot Russian Language Understanding
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

TAPE: Assessing Few-shot Russian Language Understanding

P. 2472–2497.
Taktasheva E., Shavrina T., Fenogenova A., Shevelev D., Katricheva N., Tikhonova M., Akhmetgareeva A., Zinkevich O., Bashmakova A., Iordanskaia S., Spiridonova A., Kurenshchikova V., Artemova E., Mikhailov V.

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE’s design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing the input has a more negligible effect. At the same time, the results demonstrate a significant gap between the neural and human baselines for most tasks. We publicly release TAPE (https://tape-benchmark.com) to foster research on robust LMs that can generalize to new tasks when little to no supervision is available.

Language: English
Full text
DOI
Text on another site
Keywords: NLPlanguage modelingbenchmark

In book

Findings of the Association for Computational Linguistics: EMNLP 2022
Association for Computational Linguistics, 2022.
Similar publications
Granular computing-based deep learning for text classification
Behzadidoost R., Mahan F., Izadkhah H., Information Sciences 2024 Vol. 652 Article 119746
Granular computing involves a comprehensive process that encompasses theories, methodologies, and techniques to solve complex problems, rather than being just an algorithm. As the volume of generated data continues to grow rapidly, data-driven problems have become increasingly complex. Although deep learning models have outperformed traditional machine learning models in solving complex problems, there is still room for enhancing their performance. ...
Added: March 12, 2026
HoTPP benchmark: Are we good at the long horizon events forecasting?
Karpukhin I., Shipilov F., Savchenko A., Neurocomputing 2026 Vol. 672 Article 132771
Forecasting multiple future events within a given time horizon is essential for applications in finance, retail, social networks, and healthcare. This problem is typically addressed using Marked Temporal Point Processes (MTPP), which provide a principled framework for modeling both event timing and event labels. While most existing research focuses on predicting only the next event, forecasting distant future ...
Added: February 25, 2026
30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, Kanazawa, Japan, July 4–6, 2025, Proceedings, Part I. Natural Language Processing and Information Systems. (LNCS, volume 15836)
Springer, 2025.
The two-volume set LNCS 15836 and 15837 constitutes the proceedings of the 30th International Conference on Applications of Natural Language to Information Systems, NLDB 2025, held in Kanazawa, Japan, during July 4–6, 2025. The 33 full papers, 19 short papers and 2 demo papers presented in this volume were carefully reviewed and selected from 120 submissions. ...
Added: February 3, 2026
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
INCOMA Ltd, 2021.
Added: January 28, 2026
ComputAgeBench: Epigenetic Aging Clocks Benchmark
Kriukov D., Efimov E., Kuzmina E. et al., , in: KDD '25: Proceedings of the 31th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Volume 2.: Association for Computing Machinery (ACM), 2025. P. 5560–5570.
The success of clinical trials of longevity drugs relies heavily on identifying integrative health and aging biomarkers, such as biological age. Epigenetic aging clocks predict the biological age of individuals using their DNA methylation profiles, commonly retrieved from blood samples. However, there is no standardized methodology to validate and compare epigenetic clock models. We propose ComputAgeBench, ...
Added: January 12, 2026
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Association for Computational Linguistics, 2025.
Added: November 17, 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...
Added: November 6, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Tartu: University of Tartu Library, 2025.
The third workshop on resources and representations for under-resourced languages and domains was held in Tallinn, Estonia, on March 2nd, 2025. The workshop was conducted in person but also provided an option for online participation. In alignment with the goals of the previous two workshops in 2020 and 2023, RESOURCEFUL-2025 explored the role of resource ...
Added: July 17, 2025
Evaluating the Pragmatic Competence of Large Language Models in Detecting Mitigated and Unmitigated Types of Disagreement
Shulginov V., Hasan Berkcan Şimşek, Sergei Kudriashov et al., , in: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue” (2025)Issue 23.: [б.и.], 2025. P. 345–360.
This study presents a framework for evaluating the effectiveness of language models (LLMs) in detecting disagreement across a wide range of pragmatic strategies, from mitigated forms to overt verbal aggression. Special attention is given to complex cases of implicit manifestations of irony and sarcasm, which pose significant challenges for both automated analysis and interpersonal communication. ...
Added: April 30, 2025
Bi-objective Workflow Scheduling in the Cloud: What is the Real State-of-the-Art?
Yury Semenov, Oleg Sukhoroslov, , in: Supercomputing. 10th Russian Supercomputing Days, RuSCDays 2024, Moscow, Russia, September 23–24, 2024, Revised Selected Papers, Part II* 2.: Springer, 2025. P. 20–31.
Workflow scheduling in the cloud is a challenging multi-objective optimization problem where an efficient scheduling algorithm is required to optimize both performance and cost. Despite the huge body of work on designing workflow scheduling algorithms, the differences in the experiment settings, VM instances, sets of baseline algorithms, and the choice of reference point for hypervolume ...
Added: April 25, 2025
OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities
Razzhigaev A., Kurkin M., Goncharova E. et al., , in: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP.: Association for Computational Linguistics, 2024. P. 183–195.
We introduce OmniDialog — the first trimodal comprehensive benchmark grounded in a knowledge graph (Wikidata) to evaluate the generalization of Large Multimodal Models (LMMs) across three modalities. Our benchmark consists of more than 4,000 dialogues, each averaging 10 turns, all annotated and cross-validated by human experts. The dialogues in our dataset are designed to prevent ...
Added: February 21, 2025
MERA: A Comprehensive LLM Evaluation in Russian
Fenogenova A., Chervyakov, A., Martynov N. et al., , in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024Vol. 1: Long Papers.: Bangkok: Association for Computational Linguistics, 2024. P. 9920–9948.
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). However, despite researchers’ attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, ...
Added: February 17, 2025
RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs
Taktasheva E., Bazhukov M., Koncha K. et al., , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing.: Association for Computational Linguistics, 2024. P. 9268–9299.
Minimal pairs are a well-established approach to evaluating the grammatical knowledge of language models. However, existing resources for minimal pairs address a limited number of languages and lack diversity of language-specific grammatical phenomena. This paper introduces the Russian Benchmark of Linguistic Minimal Pairs (RuBLiMP), which includes 45k pairs of sentences that differ in grammaticality and ...
Added: January 2, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit