• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • LM-Polygraph: Uncertainty Estimation for Language Models
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

LM-Polygraph: Uncertainty Estimation for Language Models

P. 446 –461.
Fadeeva E., Vashurin R., Tsvigun A., Vazhentsev A., Petrakov S., Fedyanin K., Daniil Vasilev, Goncharova E., Panchenko A., Panov M., Baldwin T., Shelmanov A.

Recent advancements in the capabilities of large language models (LLMs) have paved the way for a myriad of groundbreaking applications in various fields. However, a significant challenge arises as these models often “hallucinate”, i.e., fabricate facts without providing users an apparent means to discern the veracity of their statements. Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of LLMs. However, to date, research on UE methods for LLMs has been focused primarily on theoretical rather than engineering contributions. In this work, we tackle this issue by introducing LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. Additionally, it introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores, empowering end-users to discern unreliable responses. LM-Polygraph is compatible with the most recent LLMs, including BLOOMz, LLaMA-2, ChatGPT, and GPT-4, and is designed to support future releases of similarly-styled LMs.

Language: English
DOI
Text on another site
Keywords: Uncertainty EstimationLLM

In book

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Singapore: Association for Computational Linguistics, 2023.
Similar publications
Optimizing Computational Infrastructure for Large Language Models in Bioinformatics: A Case Study
Beknazarov N., , in: Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)Vol. 2891.: Springer, 2026. P. 3–16.
This paper addresses the challenge of efficiently training Large Language Models (LLMs) on large-scale, sparse omics datasets in high-performance computing (HPC) environments. Using over 1000 BED tracks as a representative data source, we propose a method combining interval-based chunked storage, sparse matrix transformation, and parallel data loading, integrated within a PyTorch Lightning training framework. Our ...
Added: May 19, 2026
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Seleznyov M., Chaichuk M., Ershov G. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2025.: Association for Computational Linguistics, 2025. P. 20370–20385.
Large Language Models (LLMs) are highly sensitive to subtle, non-semantic variations in prompt phrasing and formatting. In this work, we present the first systematic evaluation of 4 methods for improving prompt robustness within a unified experimental framework. We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural ...
Added: February 3, 2026
Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework
Ganeeva V., Khrabrov K., Kadurin A. et al., Journal of Cheminformatics 2025 No. 17 Article 164
The recent integration of natural language processing into chemistry has advanced drug discovery. Molecule representations in language models (LMs) are crucial to enhance chemical understanding. We explored the ability of models to match the same chemical structures despite their different representations. Recognizing the same substance in different representations is an important component of emulating the ...
Added: February 3, 2026
Aspect-Based Sentiment Analysis Using Large Language Models on Museum Visitor Reviews
Anastasia V. Kolmogorova, Elizaveta R. Kulikova, Vladislav V. Lobanov, Supercomputing Frontiers and Innovations 2025 Vol. 12 No. 3 P. 121–140
Museum reviews provide rich insight into visitor preferences and can drive useful change within institutions, yet they have attracted little attention in sentiment research owing to limited commercial interest and the multi-thematic nature of reviews. In this study we analysed over 12 000 reviews in Russian for 15 museum sites collected from nine different platforms. ...
Added: November 30, 2025
AutoJudge: Judge Decoding Without Manual Annotation
Roman Garipov, Fedor Velikonivtsev, Ivan Ermakov et al., , in: 39th Conference on Neural Information Processing Systems (NeurIPS 2025).: NeurIPS, 2025. P. 94605–94642.
We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify the generated tokens that affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster.Our approach relies ...
Added: November 6, 2025
Strategizing with AI: Insights from a Beauty Contest Experiment
Iuliia Alekseenko, Dagaev D., Sofiia Paklina et al., Journal of Economic Behavior and Organization 2025 Vol. 240 Article 107330
Added: November 6, 2025
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.
We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...
Added: November 6, 2025
Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения
Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181
Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...
Added: October 9, 2025
Оценка моделей LLM по степени готовности решать задачи управления в области ESG
Storchevoy M., Mylnikov L., Чернышев В. В. et al., / SSRN. Серия "Working Papers". 2025.
Внимание к охране природы принимает все большую значимость для бизнеса с одной стороны в связи с ужесточением в природоохранном законодательстве, а с другой в связи с использованием ESG рейтингов при принятии решений о коммерческой деятельности компаний. Составление рейтинга LLM систем, способных оказывать консультационные услуги в области природоохраны и ESG, позволяет осуществить выбор такой системы для ...
Added: September 18, 2025
Цифровой театр абсурда: могут ли нейросети поставить новую научную проблему перед психологией? Кейс-сравнение ChatGPT и DeepSeek
Хашутогова У. П., Berezner T., Poddiakov A., Новые психологические исследования 2025 № 3 С. 100–125
The rapid advancement of artificial intelligence technologies has drawn increasing attention from psychological researchers. While neural networks are being integrated into nearly all domains of human activity, the boundaries of their applicability remain unclear — particularly regarding the originality and practical value of the content they generate. Proponents advocate for their widespread adoption, whereas skeptics ...
Added: September 4, 2025
Interpreting Metaphorical Language: A Challenge to Artificial Intelligence
Skrynnikova I.V., Вестник Волгоградского государственного университета. Серия 2: Языкознание 2025 Vol. 23 No. 5 P. 99–107
In recent years, numerous studies have pointed to the ability of artificial intelligence (AI) to generate and analyze expressions of natural language. However, the question of whether AI is capable of actually interpreting human language, rather than imitating its understanding, remains open. Metaphors, being an integral part of human language, as both a common figure ...
Added: August 1, 2025
Comparative Study of LoRA and Full Fine-Tuning in Large Language Models
E.V. Surikova, E.A. Sabidaeva, , in: Параллельные вычислительные технологии – XIX всероссийская конференция с международным участием, ПаВТ'2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. P. 90–98.
Added: July 3, 2025
HR-Tech Automation: A Case Study of Resume Design using GenAI Technologies
Suleykin, A., Babenko, R., Panfilov, P., , in: Proceedings of the 35th International DAAAM Virtual Symposium ''Intelligent Manufacturing & Automation''Vol. 1.: NY: DAAAM International Vienna, 2024. Ch. 20 P. 0157–0164.
Added: April 5, 2025
OmniDialog: A Multimodal Benchmark for Generalization Across Text, Visual, and Audio Modalities
Razzhigaev A., Kurkin M., Goncharova E. et al., , in: Proceedings of the 2nd GenBench Workshop on Generalisation (Benchmarking) in NLP.: Association for Computational Linguistics, 2024. P. 183–195.
We introduce OmniDialog — the first trimodal comprehensive benchmark grounded in a knowledge graph (Wikidata) to evaluate the generalization of Large Multimodal Models (LMMs) across three modalities. Our benchmark consists of more than 4,000 dialogues, each averaging 10 turns, all annotated and cross-validated by human experts. The dialogues in our dataset are designed to prevent ...
Added: February 21, 2025
MERA: A Comprehensive LLM Evaluation in Russian
Fenogenova A., Chervyakov, A., Martynov N. et al., , in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024Vol. 1: Long Papers.: Bangkok: Association for Computational Linguistics, 2024. P. 9920–9948.
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). However, despite researchers’ attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, ...
Added: February 17, 2025
Your Transformer is Secretly Linear
Razzhigaev A., Mikhalchuk M., Goncharova E. et al., , in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024Vol. 1: Long Papers.: Bangkok: Association for Computational Linguistics, 2024. P. 5376–5384.
This paper reveals a novel linear characteristic exclusive to transformer decoders, including models like GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering an almost perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed, due to a consistently low transformer layer output ...
Added: February 17, 2025
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
Razzhigaev A., Mikhalchuk M., Goncharova E. et al., , in: Findings of the Association for Computational Linguistics: EACL 2024.: Association for Computational Linguistics, 2024. P. 868–874.
Added: February 17, 2025
ChatGPT, текст, информация: критический анализ
Komashko M. N., Труды по интеллектуальной собственности 2024 Т. 50 № 3 С. 118–128
The paper deals with theory and practice issues related to such type of artificial intelligence as large language models, in particular, ChatGPT. The main attention is paid to spheres of human activity, in which the exchange of information stated in the form of text is of the greatest importance: science, education and journalism (media sphere). The ...
Added: December 29, 2024
Automated Speech Act Annotation in a Russian Spoken Corpus Using Large Language Models: A Comparative Study
Sherstinova T., Viktoria Firsanova, , in: PROCEEDING OF THE 36TH CONFERENCE OF FRUCT ASSOCIATION.: [б.и.], 2024. P. 912–920.
The research focuses on the automatic annotation of a linguistic corpus using large language models (LLMs). Annotating a corpus is a crucial step in its creation, as it determines the practical scope and applications of the resource being developed. This study explores the annotation of oral speech transcripts at the pragmatic level using speech acts ...
Added: November 29, 2024
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models
Kardanova E., Ivanova A., Tarasova K. et al., / Series cs.CL "Computation and Language (cs.CL); Artificial Intelligence (cs.AI)". 2024.
The era of large language models (LLM) raises questions not only about how to train models, but also about how to evaluate them. Despite numerous existing benchmarks, insufficient attention is often given to creating assessments that test LLMs in a valid and reliable manner. To address this challenge, we accommodate the Evidence-centered design (ECD) methodology ...
Added: November 5, 2024
Automatic generation of physics items with Large Language Models (LLMs)
Moses Oluoke Omopekunola, Elena Yu. Kardanova, REID (Research and Evaluation in Education) 2024 Vol. 10 No. 2 P. 168–185
High-quality items are essential for producing reliable and valid assessments, offering valuable insights for decision-making processes. As the demand for items with strong psychometric properties increases for both summative and formative assessments, automatic item generation (AIG) has gained prominence. Research highlights the potential of large language models (LLMs) in the AIG process, noting the positive ...
Added: October 14, 2024
GPT3RecBot: a universal chatbot recommender of movies, books and music in Telegram
Lashinin O., Bykov K., Ananyeva M. et al., , in: Proceedings of the Fifth Knowledge-aware and Conversational Recommender Systems Workshop co-located with 17th ACM Conference on Recommender Systems (RecSys 2023)Vol. 3560.: CEUR Workshop Proceedings, 2023. P. 35–43.
Recent advances in large language models have extended their potential use cases to different domains. Models such as ChatGPT have an extensive internal knowledge base that enables them to provide answers to various domain-specific queries. In this paper, we explore the potential use of OpenAI’s GPT3.5 model as a conversational recommender system. We designed a ...
Added: December 2, 2023
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit