UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

?

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Working papers by Cornell University. Series math "arxiv.org". 2021. Article 2105.02135.

Беломестный Д. В., Левин И. В., Мулине Э. Ф., Наумов А. А., Самсонов С. В., Зорина В. О.

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure ({\sf UVIP}) that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for $V^\star$. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for {\sf UVIP} under general assumptions and illustrate its performance on a number of benchmark RL problems.

Научное направление: Математика Компьютерные науки

Приоритетные направления: компьютерно-математическое математика

Язык: английский

Полный текст

Текст на другом сайте

ПУБЛИКАЦИЯ ПОДГОТОВЛЕНА ПО РЕЗУЛЬТАТАМ ПРОЕКТА:

Анализ неопределенности в алгоритмах машинного обучения (2021)

Open Hurwitz numbers and the mKP hierarchy

Буряк А. Ю., Tessler R., Troshkin M., Journal of Geometry and Physics 2026 Vol. 223 Article 105783

We give a natural definition of open Hurwitz numbers, where the weight of each ramified covering includes an integer parameter N taken to the power that is equal to the number of boundary components of a Riemann surface with boundary mapping to . We prove that the resulting sequence of partition functions, depending on , is a tau-sequence of ...

Добавлено: 19 июня 2026 г.

Bihamiltonian structure of the DR hierarchy in the semisimple case

Буряк А. Ю., Rossi P., Communications in Mathematical Physics 2025 Vol. 406 Article 205

Of the two approaches to integrable systems associated to semisimple cohomological field theories (CohFTs), the one suggested by Dubrovin and Zhang and the more recent one using the geometry of the double ramification (DR) cycle, the second has the advantage of being very explicit. The Poisson operator of the DR hierarchy is , where is the metric ...

Добавлено: 19 июня 2026 г.

Benchmarking DNA large language models on quadruplexes

Cherednichenko O., Herbert A., Попцова М. С., Computational and Structural Biotechnology Journal 2025 Vol. 27 P. 992–1000

Добавлено: 19 июня 2026 г.

Kolmogorov–Arnold networks for genomic tasks

Попцова М. С., Briefings in Bioinformatics 2025 Vol. 26 No. 2 P. 1–11

Добавлено: 19 июня 2026 г.

Графовые паттерны в несогласованных декларативных моделях процессов

Анненков А. Н., Нестеров Р. А., Моделирование и анализ информационных систем 2026 Т. 33 № 2 С. 176–205

Декларативные модели процессов широко используются в process mining для гибкого описания поведения процессов с помощью наборов ограничений. Однако модели, автоматически извлекаемые из журналов событий, могут содержать несогласованные ограничения, что затрудняет их интерпретацию и делает их непригодными для исполнения, проверки соответствия или дальнейшего анализа. Существующие методы анализа согласованности либо опираются на автоматные конструкции с высокой асимптотической сложностью ...

Добавлено: 18 июня 2026 г.

Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part II. (LNCS, volume 16484)

Cham: Springer Publishing Company, 2026.

Добавлено: 18 июня 2026 г.

Искусственный интеллект как роза научной деятельности: исследование Тимоти Гауэрса

Поддьяков А. Н., Троицкий вариант. Наука 2026 № 12 С. 24–25

В научно-популярной заметке представлен обзор содержания поста филдсовского медалиста Тимоти Гауэрса о возможностях ИИ в математике и содержания комментариев под постом. Обзор сделан в основном чат-ботом DeepSeek. В заключение обсуждается возможность не только решения задач искусственным интеллектом, но и их постановки. ...

Добавлено: 18 июня 2026 г.

Exploring New Frontiers in Vertical Federated Learning: the Role of Saddle Point Reformulation

Beznosikov A., Kormakov G., Grigorievskiy A. и др., Journal of Optimization Theory and Applications 2026 Vol. 209 Article 18

Добавлено: 17 июня 2026 г.

Optimal Extraction with an Impact on Diffusion-Jump Pricing

Garzón J., Mora Rodríguez J., Морено Ф. Г., Applied Mathematics and Optimization 2026 Vol. 94 No. 10 P. 1–43

Добавлено: 17 июня 2026 г.

Supervised Learning in Critical Phenomena—Statistical and Systematic Accuracy

Chertenkov V. I., Щур Л. Н., Lobachevskii Journal of Mathematics 2026 Vol. 47 No. 2 P. 720–727

Добавлено: 16 июня 2026 г.

Enhancing Emotion Recognition in Speech Based on Self-Supervised Learning: Cross-Attention Fusion of Acoustic and Semantic Features

Deeb B., Andrey V. Savchenko, Макаров И. А., IEEE Access 2026 Vol. 13 P. 56283–56295

Добавлено: 16 июня 2026 г.

Automated detection of wolf howls using audio spectrogram transformers

Makarov N., Савченко А. В., Zemtsova I. и др., Scientific Reports 2025 Vol. 15 Article 26641

Добавлено: 16 июня 2026 г.

Artificial intelligence framework for multi-pathology risk assessment from retinal fundus images: deep learning approach to 15-disease screening

Vasilev R., Савченко А. В., Blinov P. и др., Frontiers in Medicine 2026 Vol. 13

Добавлено: 16 июня 2026 г.

From Data to Signs: A Foundation Model for Multilingual Sign Language Recognition

Novopoltsev M., Tulenkov A., Murtazin R. и др., IEEE Access 2025 Vol. 13 P. 188170–188181

Добавлено: 16 июня 2026 г.

Об устройстве целевого приёма в России.

Нестеров А. С., Журнал Новой экономической ассоциации 2026

В этой статье рассматривается целевой приём в вузы в России с точки зрения науки об устройстве рынков сочетания и экономических механизмов (matching market and mechanism design), ключевого направления современной теории игр. Мы изучаем механизм целевого приёма -- набор правил, по которым устраивается трёхстороннее сочетание между абитуриентом, заказчиком и образовательной программой. Используемый в России механизм имеет ...

Добавлено: 16 июня 2026 г.

B3Emo: Quantifying Affect as a Double-Edged Sword in Strategic LLM Interactions