UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

?

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Working papers by Cornell University. Series math "arxiv.org". 2021. Article 2105.02135.

Belomestny D., Levin I., Moulines E., Naumov A., Samsonov S., Zorina V.

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure ({\sf UVIP}) that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for $V^\star$. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for {\sf UVIP} under general assumptions and illustrate its performance on a number of benchmark RL problems.

Research target: Mathematics Computer Science

Priority areas: IT and mathematics mathematics

Language: English

Full text

Text on another site

Publication based on the results of:

Uncertainty quantification in machine learning algorithms (2021)

Strong Approximations for Markov Chains Weakly Converging to Diffusions

Konakov V., Kucher D., Mammen E., / Series arXiv "math". 2026. No. 2606.11142v1.

In this paper, we construct strong approximations for discrete-time Markov chains weakly converging to continuous diffusion processes, as well as for their perturbed counterparts. Under the assumption of bounded coefficients, we construct closely coupled versions of these processes on a shared probability space. In particular, for both non-degenerate and degenerate cases, we maximize the probability ...

Added: June 11, 2026

Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)

Strube M., Braud C., Hardmeier C. et al., Suzhou: Association for Computational Linguistics, 2025.

Added: June 11, 2026

On the Ramsey Number R(K_{1,s},P_t)

Kh. Kh. Abdullin, D. B. Mokeev, D. S. Taletskii, Mathematical notes 2026 Vol. 119 No. 1 P. 3–7

By the Ramsey number R(K1,s,Pt) one means the least positive integer n such that, for every n-vertex graph G, the following condition holds: either G contains a vertex of degree at least s or the complement of G contains a simple t-path. In this paper, we ﬁ nd precise values of R(K1,s,Pt) for certain values ...

Added: June 10, 2026

TreeDQN: Sample-efficient off-policy reinforcement learning for combinatorial optimization

Sorokin D., Kostin A., Savchenko L. et al., Knowledge-Based Systems 2026 Vol. 348 Article 116258

A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method. Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning method based on the tree Markov Decision Process. To overcome its main disadvantages, namely, very large training time ...

Added: June 10, 2026

Microbial diversity and production of milk spirit using traditional Buryat fermentation and distillation technologies

Namsaraev Z., Nanzatov B., Kozlova A. et al., Scientific Reports 2026 Vol. 16 No. 1 Article 17769

Distilled fermented milk beverages are rare in food technology, despite the global prevalence of plant-based spirits. Currently, the production of distilled strong alcoholic beverages from fermented milk using traditional technologies is known only among Mongolic-speaking peoples and their Siberian neighbors. This study provides the first interdisciplinary analysis of darasun, a traditional Buryat spirit made from fermented ...

Added: June 10, 2026

Artificial intelligence and digital twins for failure prediction in data center cooling systems: a comprehensive literature review (2018–2026)

Butorova A., Bobakov V., Sergeev A. et al., European Physical Journal: Special Topics 2026 P. 1–19

This paper presents a review of artificial intelligence (AI) methods for failure prediction in data center cooling systems, with a focus on the integration of digital twins (DTs), physics-informed learning, and graph-based models. Positioned within complex network science, this review addresses a limitation of conventional graph approaches—their reliance on pairwise connectivity—whereas real-world failures often arise ...

Added: June 10, 2026

Innovations in Information and Decision Sciences. Proceedings of the 13th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2025), Volume 4

Springer, 2026.

The book presents the proceedings of the 13th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2024), held at Intelligent Systems Research Group (ISRG), London Metropolitan University, London, United Kingdom, during June 6–7, 2025. Researchers, scientists, engineers and practitioners exchange new ideas and experiences in the domain of intelligent computing theories with ...

Added: June 8, 2026

Wave dynamics within the Whitham-Ostrovsky equation

Flamarion M. V., Pelinovsky E., Nonlinear Dynamics 2026 Vol. 114 Article 784

In this article, we investigate wave packet and solitary wave dynamics in the Whitham–Ostrovsky (WO) equation. By means of a multiple-scales expansion, we formally derive a nonlinear Schrödinger (NLS) equation governing the envelope evolution.The corresponding modulational stability diagram is then obtained using the Lighthill criterion. We show that sufficiently large values of the low-frequency dispersive term render ...

Added: June 5, 2026

ML-based Fast Simulation of FARICH Responses

Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...

Added: May 19, 2026

Bifurcations and Structural Stability of Generic PC-HC Families

Dorovskiy A., / Series arXiv "math". 2026.

In this paper the structural stability of generic families of vector fields of the PC-HC class on the two-dimensional sphere is proved. A classification of these families up to moderate equivalence in neighborhoods of their large bifurcation supports is presented, based on such invariants as the configuration and the characteristic set. The realization lemma is proved. ...

Added: May 14, 2026

On the minimum number of maximal distance-k independent sets in trees

Taletskii D., / Series arXiv "math". 2026.

A vertex subset of a graph is called a \textit{distance-$k$ independent set} if the distance between any two of its distinct vertices is at least $k + 1$. For all $n,k \geq 1$, we determine the minimum possible number of inclusion-wise maximal distance-$k$ independent sets among all $n$-vertex trees. It equals~$n$ if $n \leq k ...

Added: May 1, 2026

On Arithmetic Mirror Symmetry for smooth Fano fourfolds

Ovcharenko M., / Series arXiv "math". 2026.

We introduce an explicit class of tempered Laurent polynomials in the sense of Villegas and Doran--Kerr in n⩽4 variables including all Landau--Ginzburg models for smooth Fano threefolds with very ample anticanonical class. We check that it contains Landau--Ginzburg models for various Fano fourfolds which are complete intersections in smooth toric varieties and Grassmannians of planes, ...

Added: April 30, 2026

Natural hazard database from Internet publications: text mining with a large language model

Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.

Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...

Added: April 28, 2026

Algorithmic overlaps as thermodynamic variables: from local to cluster Monte Carlo dynamics in critical phenomena

Pilé I., Deng Y., Shchur L., / Series arXiv "math". 2026. No. 2604.10254.

We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Svendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for two models in different universality classes: the Ising model and the Potts model with three components. The overlap of two ...

Added: April 20, 2026

On weak solutions to the 1d compressible Navier-Stokes equations: a Lipschitz continuous dependence on data in weaker norms and an error of their homogenization

Zlotnik Alexander, / Series arXiv "math". 2026. No. 2602.03481v1.

We deal with the global in time weak solutions to the 1D compressible Navier-Stokes system of equations for large discontinuous initial data and nonhomogeneous boundary conditions of three standard types. We prove the Lipschitz-type continuous dependence of the solution $(\eta,u,\theta)$, in a norm slightly stronger than $L^{2,\infty}(Q)\times L^2(Q)\times L^2(Q)$, on the initial data $(\eta^0,u^0,e^0)$ in a ...

Added: April 18, 2026

On the dimension of the space of static potentials on three-manifolds

Medvedev V., / Series arXiv "math". 2026.

We investigate the interplay between the dimension of the space of static potentials and the geometric and topological structure of the underlying static three-manifold. A partial classification of boundaryless static manifolds is obtained in terms of this dimension. We also treat the case of static manifolds with boundary. In particular, we prove that if a ...

Added: April 3, 2026

Using predefined vector systems to speed up neural network multimillion class classification

Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.

Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...

Added: April 2, 2026

Homogeneous maximizers of the Blaschke-Santalo-type functionals

Kolesnikov A., / Series arXiv "math". 2025.

We study Blaschke--Santal{ó}-type inequalities for N>=2 sets (functions) and a special class of cost functions. In particular, we prove new results about reduction of the maximization problem for the Blaschke--Santal{ó}-type functional to homogeneous case (functional inequalities on the sphere) and extend the symmetrization argument to the case of N>2 sets. We also discuss links to the ...

Added: February 13, 2026

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Belomestny D., Levin I., Naumov A. et al., Journal of Optimization Theory and Applications 2026 Vol. 208 Article 89

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). However, even a precise knowledge of the value function Vπ corresponding to a policy π does not provide reliable information on how far the policy π is from the optimal one. We present a novel model-free upper value iteration ...

Added: February 10, 2026

Russia on the Path Towards a New Technology Industrial Policy: Exciting Prospects and Fatal Traps

Simachev Y. V., Kuzyk M., Kuznetsov B. et al., Foresight and STI Governance 2014 Vol. 8 No. 4 P. 6–23

Traditionally industrial policy is under scrutiny worldwide. In recent years, issues of its elaboration have gained increased importance in Russia as well. Among the forefront tasks are the harmonization of domestic industrial policy with science, technology and innovation policy, taking into account the specificity of different sectors and technological areas, diversification of the national economy, the formation of ...

Added: October 22, 2025

Исследование делового климата в российской науке: апробация подхода

Gershman M., Gokhberg L., Kuznetsova T., Вопросы экономики 2025 № 6 С. 114–136

In this article, we discuss a novel approach to the assessment of the situation (business climate) in the science and technology (S&T) field, as well as the results of its testing within three large-scale surveys of top-managers of R&D organizations and universities conducted in 2017, 2022 and 2024. The methodology is based on the theory ...

Added: June 14, 2025

Doing science an approach to a comprehensive assessment of the business climate for science and technology

Gokhberg L., Meissner D., Gershman M. et al., Technology in Society 2025 Vol. 82 Article 102948

Assessment of science and technology has become widespread in policy making to measure the impact of public funding, align respective policy portfolios and initiate changes in the national institutional environment. In that respect, evaluations of different scope, shape and size and for different purposes are developed and implemented. Such exercises mainly focus on institutions or ...

Added: June 10, 2025

Изменения в здоровом образе жизни в период пандемии COVID-19 и государственная политика: систематический обзор исследований

Zasimova L. S., Kolosnitsyna M., Kossova T. V. et al., Электронный научный журнал "Социальные аспекты здоровья населения" 2024 Т. 70 № 2 Статья 12

Significance. The COVID-19 pandemic and the state anti-epidemic policies have significantly changed everyday habits and lifestyles. In many countries, governments have introduced self-isolation, lockdowns, generous family benefits to support population against the background of the existing and newly developed healthy lifestyle measures. Their effects (often multidirectional) overlapped each other, making it difficult to estimate the ...

Added: June 10, 2024

Do Counter-sanctions in Agriculture Promote Growth? Evidence from Russia

Kotyrlo E., Zaytsev A., Applied Economics 2024 Vol. 56 No. 56 P. 7563–7574

This study evaluates the effect of the Russian agri-food embargo introduced in 2014. The embargo was expected to boost growth in Russian agriculture due to an import substitution policy. To isolate its effect on the Russian agricultural growth rate from other factors like currency devaluation, overall economic stagnation and external financial sanctions, we used the ...

Added: December 6, 2023