UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

D. Belomestny; I. Levin; A. Naumov; S. Samsonov

doi:10.1007/s10957-025-02903-1

Publications

?

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

Journal of Optimization Theory and Applications. 2026. Vol. 208. Article 89.

Belomestny D., Levin I., Naumov A., Samsonov S.

Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). However, even a precise knowledge of the value function Vπ corresponding to a policy π does not provide reliable information on how far the policy π is from the optimal one. We present a novel model-free upper value iteration procedure (UVIP) that allows us to estimate the suboptimality gap V(x) − Vπ (x) from above and to construct confidence intervals for V. Our approach relies on upper bounds to the solution of the Bellman optimality equation via the martingale approach. We provide theoretical guarantees for UVIP under general assumptions and illustrate its performance on a number of benchmark RL problems. Communicated by Alexander Vladimirovich Gasnikov

Research target: Computer Science

Keywords: policy evaluation policy error confidence intervals for optimal value function Reinforcement learning Model-free Algorithm

Stable On-the-Fly Learning for Dynamic Neural Networks With Delayed Inputs

Kibkalo Vladislav, Chertopolokhov V., Mukhamedov A. et al., IEEE Access 2026 Vol. 14 P. 14369–14392

This study presents on-the-fly identification and multi-step prediction of nonlinear systems with delayed inputs using a dynamic neural network combined with a smooth projection onto ellipsoids. The projection enforces parameter constraints that guarantee stability, while a Lyapunov–Krasovskii analysis yields computable ultimate error bounds. Riccati-type matrix inequalities are derived, providing an efficient vectorization–projection–devectorization implementation suitable for ...

Added: May 22, 2026

Опыт применения сетевого анализа (SNA) в историческом нарративе полисубъектного региона (на примере валлийской хроники Brut y Tywysogyon)

Loshkareva M. E., Matveeva N., Вестник Томского государственного университета. История 2026 № 100 С. 112–118

This research is an endeavor to apply social network analysis (SNA) to the study of a medieval narrative source. The authors suppose that the use of network analysis may offer new possibilities in the study of the history of regions characterized by some political fragmentation. Authors tried to construct networks of historical interactions from 1193 ...

Added: May 22, 2026

ML-based Fast Simulation of FARICH Responses

Shipilov F., Barnyakov A., Ivanov A. et al., / Series Physics "arxiv.org". 2026.

A fast simulation of the detector response is a vital task in high-energy physics (HEP). Traditional Monte-Carlo methods form the backbone of modern particle physics simulation software but are computationally expensive. We present a machine-learning-based approach to fast simulation of the Focusing Aerogel Ring Imaging Cherenkov (FARICH) detector response. Given a particle track and momentum, ...

Added: May 19, 2026

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Rabat: Association for Computational Linguistics, 2026.

Added: May 19, 2026

Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Kondratev S., Yulia Dyrchenkova, Georgiy Nikitin et al., Technologies 2026 Vol. 14 No. 1 Article 69

Added: May 19, 2026

Parallel Computational Technologies. PCT 2025

Springer, 2025.

This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...

Added: May 18, 2026

KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security

Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15

To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...

Added: May 16, 2026

DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data

Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66

Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...

Added: May 16, 2026

QGKM: A Quantum Fidelity-Based Graph Clustering Framework for Robust Data Pattern Recognition in Education Social Networks

Xiong N., Long W., He D. et al., Algorithms 2026 Vol. 19 No. 5 Article 386

In the era of data-driven education, educational social networks generate large volumes of high-dimensional and complex-structured data through learner interactions, collaborative activities, and resource-sharing behaviors, posing significant challenges to traditional unsupervised learning methods. Such data often exhibit non-convex distributions, heterogeneity, and noise sensitivity, making conventional clustering approaches insufficient for capturing their intrinsic structural relationships. To ...

Added: May 13, 2026

Proceedings of the 9th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

Velichkov B., Nikolova-Koleva I., Slavcheva M., Shumen: INCOMA Ltd, 2025.

The RANLP 2025 Student Research Workshop (RANLPStud’2025) is a special track of the established international conference Recent Advances in Natural Language Processing (RANLP’2025). The RANLPStud is being organised for the 9th time and this year is running in parallel with the other tracks of the main RANLP 2025 conference. The target of RANLPStud’25 is to be a ...

Added: May 12, 2026

Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)

Springer, 2026.

Added: May 12, 2026

Интегрированная среда моделирования для верификации и валидации программ управления подключенными и высокоавтоматизированными транспортными средствами

Stepanyants V., Долгов И. М., Хорошилов Г. С. et al., Труды Института системного программирования РАН 2026 Т. 38 № 3 С. 95–110

Highly automated and connected vehicles are gradually entering the market. Currently, solutions are being proposed that allow these technologies to be used for cooperative driving automation, which can significantly improve traffic safety. Such technologies and their software should be tested to ensure safety before being implemented in real systems. Verification and validation of vehicular control ...

Added: May 12, 2026

Connected and Automated Vehicle Scenario Manager Graphical User Interface

Tikhonov R., Efendiev M. T., Fedotenkov A. A., 2026 International Russian Smart Industry Conference (SmartIndustryCon) 2026 P. 542–547

High-fidelity simulation environments like CARLA and ROS are essential for connected and automated vehicle research. They allow researchers to verify and validate new software and technology without the time, financial, and safety overheads of real-world testing. However, their operation requires considerable expertise for creating platform-specific scenario configuration files, which complicates the research workflow. This paper ...

Added: May 11, 2026

Proceedings 2026 IEEE 11th International Conference on Smart Cloud SmartCloud 2026 8-10 May 2026

Los Alamitos: IEEE Computer Society, 2026.

It is a great pleasure for us to welcome you on behalf of the conference committees, to the 11th IEEE International Conference on Smart Cloud (IEEE SmartCloud 2026), we are glad that we can have this international conference in New York city, USA. Now, please allow us to introduce the IEEE SmartCloud 2026 conference. The ...

Added: May 10, 2026

От неизвестности к прозрачности: обзор технологий объяснимого ИИ (XAI)

Avdoshin S. M., Pesotskaya E. Y., Информационные технологии 2026 Т. 32 № 4 С. 185–194

With the rapid advancement of artificial intelligence, and deep learning in particular, models have emerged that are capable of delivering highly accurate predictions. However, the internal logic of such models remains difficult to interpret—an issue of critical importance, especially in domains where the correctness of an algorithm directly affects high-stakes decision-making. One promising avenue for ...

Added: May 8, 2026

Explainable AI for Industry 5.0: Shedding light on the black box

Avdoshin S. M., Pesotskaya E. Y., Business Informatics 2026 Vol. 20 No. 1 P. 7–28

The rapid development of artificial intelligence (AI) is accompanied by increasing computational complexity and decreasing model transparency, which significantly limits its adoption in critical domains that require a high level of trust, interpretability, and justification of decisions. Under these conditions, the field of Explainable Artificial Intelligence (XAI) has gained particular importance as it focuses on approaches and technologies that ...

Added: May 8, 2026

Comparative Analysis of Students’ Perceptions of Programming Puzzles: Parson’s and Wordle-Like

Varnavsky A., IEEE Access 2026 Vol. 14 P. 37487–37508

Puzzles are an excellent tool for learning computer science and programming, fostering increased interest, engagement, and motivation among students, as well as developing logical, critical, and computational thinking. Among beginner programmers, Parson's Programming Puzzles are quite popular, aimed at mastering the basic syntactic and logical constructs of programming languages. However, as students' skills grow, their ...

Added: May 7, 2026

Towards performance analysis of GPU-aware MPI over Angara interconnect

Ismagilov T., Mukosey A., Smirnov F. et al., International Journal of High Performance Computing Applications 2026 Vol. 40 No. 2 P. 240–253

One of the most important aspects of supercomputer development in the post-Moore era is the interconnect technologies that allow one to unite a multitude of processing elements into a well-synchronized computing system. Novel types of supercomputer interconnect require careful benchmarking and compliance with the requirements of modern hardware trends. GPU-based heterogeneous computing is one of ...

Added: May 7, 2026

Russia on the Path Towards a New Technology Industrial Policy: Exciting Prospects and Fatal Traps

Simachev Y. V., Kuzyk M., Kuznetsov B. et al., Foresight and STI Governance 2014 Vol. 8 No. 4 P. 6–23

Traditionally industrial policy is under scrutiny worldwide. In recent years, issues of its elaboration have gained increased importance in Russia as well. Among the forefront tasks are the harmonization of domestic industrial policy with science, technology and innovation policy, taking into account the specificity of different sectors and technological areas, diversification of the national economy, the formation of ...

Added: October 22, 2025

Impact of self-learning based high-frequency traders on the stock market

Mansurov K., Semenov A., Dmitry Grigoriev et al., Expert Systems with Applications 2023 Vol. 232 Article 120567

In this paper we investigate the role of self-learning agents in multi-agent models of financial markets. We develop an agent-based simulation model of a financial market and, in addition to the agents with fixed strategies used in previous research, we introduce an agent with a self-learning strategy. To model the behavior of such an agent, ...

Added: July 11, 2025

Cryptocurrency Exchange Simulation

Mansurov K., Semenov A., Dmitry Grigoriev et al., Computational Economics 2024 Vol. 64 P. 2585–2603

In this paper, we consider the approach of applying state-of-the-art machine learning algorithms to simulate some financial markets. In this case, we choose the cryptocurrency market based on the assumption that such markets more active today. As a rule, they have more volatility, attracting riskier traders. Considering classic trading strategies, we also introduce an agent with a ...

Added: July 11, 2025

Исследование делового климата в российской науке: апробация подхода

Gershman M., Gokhberg L., Kuznetsova T., Вопросы экономики 2025 № 6 С. 114–136

In this article, we discuss a novel approach to the assessment of the situation (business climate) in the science and technology (S&T) field, as well as the results of its testing within three large-scale surveys of top-managers of R&D organizations and universities conducted in 2017, 2022 and 2024. The methodology is based on the theory ...

Added: June 14, 2025