• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 11, 2026
Doctoral Student at HSE University Reveals Hidden Layout of Ancient Parion
İdil Malgil, a researcher at HSE University, conducted a UAV-based LiDAR survey of the ancient Roman city of Parion in present-day Turkey. The high density of the scans allowed the team to detect subtle terrain features concealed beneath the ground and vegetation. The survey revealed traces of entire neighbourhoods, terraced structures, and walls that had remained invisible during routine excavations and could not be identified through aerial photography. The findings have been published in Ancient Civilizations from Scythia to Siberia.
June 11, 2026
Mathematicians from Nizhny Novgorod and Shanghai Study System Stability
Mathematicians at HSE University–Nizhny Novgorod, in collaboration with colleagues from Tongji University in Shanghai, are investigating the fundamental causes of structural stability in systems and the mechanisms underlying its disruption. In this interview with the HSE News Service, Prof. Olga Pochinka, Head of the International Laboratory of Dynamical Systems and Applications at HSE University–Nizhny Novgorod and leader of the project ‘Qualitative Theory of Systems of Ordinary and Partial Differential Equations,’ discusses the project, which is being implemented as part of HSE University's International Academic Cooperation programme.
June 11, 2026
Neurolinguists Assist in Awake Surgery on 11-Year-Old Patient with Epilepsy
Researchers at the HSE Centre for Language and Brain took part in a rare awake neurosurgical procedure performed on an 11-year-old patient with drug-resistant epilepsy. Working alongside surgeons at the Voyno-Yasenetsky Centre of Specialised Medical Care for Children in Solntsevo, they monitored the resection of a portion of the left temporal lobe, where the epileptic focus had been identified.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Working papers by Cornell University. Series math "arxiv.org". 2021. Article 2105.02135.
Belomestny D., Levin I., Moulines E., Naumov A., Samsonov S., Zorina V.

Policy evaluation  is an important instrument  for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the  policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure ({\sf UVIP}) that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for \(V^\star\). Our approach  relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for {\sf UVIP} under general assumptions and illustrate its performance on a number of benchmark RL problems.

Research target: Mathematics Computer Science
Priority areas: IT and mathematics mathematics
Language: English
Full text
Text on another site
Keywords: policy evaluationpolicy errorconfidence intervals for optimal value function
Publication based on the results of:
Uncertainty quantification in machine learning algorithms (2021)
Similar publications
Strong Approximations for Markov Chains Weakly Converging to Diffusions
Konakov V., Kucher D., Mammen E., / Series arXiv "math". 2026. No. 2606.11142v1.
In this paper, we construct strong approximations for discrete-time Markov chains weakly converging to continuous diffusion processes, as well as for their perturbed counterparts. Under the assumption of bounded coefficients, we construct closely coupled versions of these processes on a shared probability space. In particular, for both non-degenerate and degenerate cases, we maximize the probability ...
Added: June 11, 2026
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Strube M., Braud C., Hardmeier C. et al., Suzhou: Association for Computational Linguistics, 2025.
Added: June 11, 2026
On the Ramsey Number R(K_{1,s},P_t)
Kh. Kh. Abdullin, D. B. Mokeev, D. S. Taletskii, Mathematical notes 2026 Vol. 119 No. 1 P. 3–7
By the Ramsey number R(K1,s,Pt) one means the least positive integer n such that, for every n-vertex graph G, the following condition holds: either G contains a vertex of degree at least s or the complement of G contains a simple t-path. In this paper, we fi nd precise values of R(K1,s,Pt) for certain values ...
Added: June 10, 2026
TreeDQN: Sample-efficient off-policy reinforcement learning for combinatorial optimization
Sorokin D., Kostin A., Savchenko L. et al., Knowledge-Based Systems 2026 Vol. 348 Article 116258
A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method. Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning method based on the tree Markov Decision Process. To overcome its main disadvantages, namely, very large training time ...
Added: June 10, 2026
Microbial diversity and production of milk spirit using traditional Buryat fermentation and distillation technologies
Namsaraev Z., Nanzatov B., Kozlova A. et al., Scientific Reports 2026 Vol. 16 No. 1 Article 17769
Distilled fermented milk beverages are rare in food technology, despite the global prevalence of plant-based spirits. Currently, the production of distilled strong alcoholic beverages from fermented milk using traditional technologies is known only among Mongolic-speaking peoples and their Siberian neighbors. This study provides the first interdisciplinary analysis of darasun, a traditional Buryat spirit made from fermented ...
Added: June 10, 2026
Artificial intelligence and digital twins for failure prediction in data center cooling systems: a comprehensive literature review (2018–2026)
Butorova A., Bobakov V., Sergeev A. et al., European Physical Journal: Special Topics 2026 P. 1–19
This paper presents a review of artificial intelligence (AI) methods for failure prediction in data center cooling systems, with a focus on the integration of digital twins (DTs), physics-informed learning, and graph-based models. Positioned within complex network science, this review addresses a limitation of conventional graph approaches—their reliance on pairwise connectivity—whereas real-world failures often arise ...
Added: June 10, 2026
Innovations in Information and Decision Sciences. Proceedings of the 13th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2025), Volume 4
Springer, 2026.
The book presents the proceedings of the 13th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2024), held at Intelligent Systems Research Group (ISRG), London Metropolitan University, London, United Kingdom, during June 6–7, 2025. Researchers, scientists, engineers and practitioners exchange new ideas and experiences in the domain of intelligent computing theories with ...
Added: June 8, 2026
Wave dynamics within the Whitham-Ostrovsky equation
Flamarion M. V., Pelinovsky E., Nonlinear Dynamics 2026 Vol. 114 Article 784
In this article, we investigate wave packet and solitary wave dynamics in the Whitham–Ostrovsky (WO) equation. By means of a multiple-scales expansion, we formally derive a nonlinear Schrödinger (NLS) equation governing the envelope evolution.The corresponding modulational stability diagram is then obtained using the Lighthill criterion. We show that sufficiently large values of the low-frequency dispersive term render ...
Added: June 5, 2026
ML-based Fast Simulation of FARICH Responses
Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.
A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...
Added: May 19, 2026
Bifurcations and Structural Stability of Generic PC-HC Families
Dorovskiy A., / Series arXiv "math". 2026.
In this paper the structural stability of generic families of vector fields of the PC-HC class on the two-dimensional sphere is proved. A classification of these families up to moderate equivalence in neighborhoods of their large bifurcation supports is presented, based on such invariants as the configuration and the characteristic set. The realization lemma is proved. ...
Added: May 14, 2026
On the minimum number of maximal distance-k independent sets in trees
Taletskii D., / Series arXiv "math". 2026.
A vertex subset of a graph is called a \textit{distance-$k$ independent set} if the distance between any two of its distinct vertices is at least $k + 1$. For all $n,k \geq 1$, we determine the minimum possible number of inclusion-wise maximal distance-$k$ independent sets among all $n$-vertex trees. It equals~$n$ if $n \leq k ...
Added: May 1, 2026
On Arithmetic Mirror Symmetry for smooth Fano fourfolds
Ovcharenko M., / Series arXiv "math". 2026.
We introduce an explicit class of tempered Laurent polynomials in the sense of Villegas and Doran--Kerr in n⩽4 variables including all Landau--Ginzburg models for smooth Fano threefolds with very ample anticanonical class. We check that it contains Landau--Ginzburg models for various Fano fourfolds which are complete intersections in smooth toric varieties and Grassmannians of planes, ...
Added: April 30, 2026
Natural hazard database from Internet publications: text mining with a large language model
Derkacheva A., Sakirkina M., Kraev G. et al., /. 2026.
Comprehensive data on natural hazards and their consequences are crucial for effective for risk assessment, adaptation planning, and emergency response. However, many countries face challenges with fragmented, inconsistent, and inaccessible data, particularly regarding local-scale events. To address this data gap in Russia, we developed an end-to-end processing pipeline that scrapes news from various online sources, ...
Added: April 28, 2026
Algorithmic overlaps as thermodynamic variables: from local to cluster Monte Carlo dynamics in critical phenomena
Pilé I., Deng Y., Shchur L., / Series arXiv "math". 2026. No. 2604.10254.
We investigate the spatial overlap of successive spin configurations in Markov chain Monte Carlo simulations using the local Metropolis algorithm and the Svendsen-Wang and Wolff cluster algorithms. We examine the dynamics of these algorithms for two models in different universality classes: the Ising model and the Potts model with three components. The overlap of two ...
Added: April 20, 2026
On weak solutions to the 1d compressible Navier-Stokes equations: a Lipschitz continuous dependence on data in weaker norms and an error of their homogenization
Zlotnik Alexander, / Series arXiv "math". 2026. No. 2602.03481v1.
We deal with the global in time weak solutions to the 1D compressible Navier-Stokes system of equations for large discontinuous initial data and nonhomogeneous boundary conditions of three standard types. We prove the Lipschitz-type continuous dependence of the solution $(\eta,u,\theta)$, in a norm slightly stronger than $L^{2,\infty}(Q)\times L^2(Q)\times L^2(Q)$,  on the initial data $(\eta^0,u^0,e^0)$ in a ...
Added: April 18, 2026
On the dimension of the space of static potentials on three-manifolds
Medvedev V., / Series arXiv "math". 2026.
We investigate the interplay between the dimension of the space of static potentials and the geometric and topological structure of the underlying static three-manifold. A partial classification of boundaryless static manifolds is obtained in terms of this dimension. We also treat the case of static manifolds with boundary. In particular, we prove that if a ...
Added: April 3, 2026
Using predefined vector systems to speed up neural network multimillion class classification
Gabdullin N., Androsov I., / Series Computer Science "arxiv.org". 2026.
Label prediction in neural networks (NNs) has O(n) complexity proportional to the number of classes. This holds true for classification using fully connected layers and cosine similarity with some set of class prototypes. In this paper we show that if NN latent space (LS) geometry is known and possesses specific properties, label prediction complexity can ...
Added: April 2, 2026
Homogeneous maximizers of the Blaschke-Santalo-type functionals
Kolesnikov A., / Series arXiv "math". 2025.
We study Blaschke--Santal{ó}-type inequalities for N>=2  sets (functions) and a special class of cost functions. In particular, we prove new results about reduction of the maximization problem for the Blaschke--Santal{ó}-type functional to homogeneous case (functional inequalities on the sphere) and extend the symmetrization argument to the case of  N>2 sets. We also discuss links to the ...
Added: February 13, 2026
UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
Belomestny D., Levin I., Naumov A. et al., Journal of Optimization Theory and Applications 2026 Vol. 208 Article 89
Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). However, even a precise knowledge of the value function Vπ corresponding to a policy π does not provide reliable information on how far the policy π is from the optimal one. We present a novel model-free upper value iteration ...
Added: February 10, 2026
Russia on the Path Towards a New Technology Industrial Policy: Exciting Prospects and Fatal Traps
Simachev Y. V., Kuzyk M., Kuznetsov B. et al., Foresight and STI Governance 2014 Vol. 8 No. 4 P. 6–23
Traditionally industrial policy is under scrutiny worldwide. In recent years, issues of its elaboration have gained increased importance in Russia as well. Among the forefront tasks are the harmonization of domestic industrial policy with science, technology and innovation policy, taking into account the specificity of different sectors and technological areas, diversification of the national economy, the formation of ...
Added: October 22, 2025
Исследование делового климата в российской науке: апробация подхода
Gershman M., Gokhberg L., Kuznetsova T., Вопросы экономики 2025 № 6 С. 114–136
In this article, we discuss a novel approach to the assessment of the situation (business climate) in the science and technology (S&T) field, as well as the results of its testing within three large-scale surveys of top-managers of R&D organizations and universities conducted in 2017, 2022 and 2024. The methodology is based on the theory ...
Added: June 14, 2025
Doing science an approach to a comprehensive assessment of the business climate for science and technology
Gokhberg L., Meissner D., Gershman M. et al., Technology in Society 2025 Vol. 82 Article 102948
Assessment of science and technology has become widespread in policy making to measure the impact of public funding, align respective policy portfolios and initiate changes in the national institutional environment. In that respect, evaluations of different scope, shape and size and for different purposes are developed and implemented. Such exercises mainly focus on institutions or ...
Added: June 10, 2025
Изменения в здоровом образе жизни в период пандемии COVID-19 и государственная политика: систематический обзор исследований
Zasimova L. S., Kolosnitsyna M., Kossova T. V. et al., Электронный научный журнал "Социальные аспекты здоровья населения" 2024 Т. 70 № 2 Статья 12
Significance. The COVID-19 pandemic and the state anti-epidemic policies have significantly changed everyday habits and lifestyles. In many countries, governments have introduced self-isolation, lockdowns, generous family benefits to support population against the background of the existing and newly developed healthy lifestyle measures. Their effects (often multidirectional) overlapped each other, making it difficult to estimate the ...
Added: June 10, 2024
Do Counter-sanctions in Agriculture Promote Growth? Evidence from Russia
Kotyrlo E., Zaytsev A., Applied Economics 2024 Vol. 56 No. 56 P. 7563–7574
This study evaluates the effect of the Russian agri-food embargo introduced in 2014. The embargo was expected to boost growth in Russian agriculture due to an import substitution policy. To isolate its effect on the Russian agricultural growth rate from other factors like currency devaluation, overall economic stagnation and external financial sanctions, we used the ...
Added: December 6, 2023
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit