Optimal Approximation of Average Reward Markov Decision Processes

Y. Sapronov; N. Yudin

doi:10.1134/S0965542524702191

Publications

?

Optimal Approximation of Average Reward Markov Decision Processes

Computational Mathematics and Mathematical Physics. 2025. Vol. 65. No. 3. P. 567–581.

Sapronov Y., Yudin N.

We continue to develop the concept of studying the ε-optimal policy for Average Reward Markov Decision Processes (AMDP) by reducing it to Discounted Markov Decision Processes (DMDP). Existing research often stipulates that the discount factor must not fall below a certain threshold. Typically, this threshold is close to one, and as is well-known, iterative methods used to find the optimal policy for DMDP become less effective as the discount factor approaches this value.

Our work distinguishes itself from existing studies by allowing for inaccuracies in solving the empirical Bellman equation. Despite this, we have managed to maintain the sample complexity that aligns with the latest results. We have succeeded in separating the contributions from the inaccuracy of approximating the transition matrix and the residuals in solving the Bellman equation in the upper estimate so that our findings enable us to determine the total complexity of the epsilon-optimal policy analysis for DMDP across any method with a theoretical foundation in iterative complexity.

Research target: Mathematics Computer Science

Language: English

DOI

Text on another site

Keywords: Markov Decision Processes вычислительная сложность обучение с подкреплением адгритмы и алгоритмическая сложность размер выборки sample complexity reinforcement learning (RL)iteration complexity марковские процессы принятия решений

Upper bounds for Steklov eigenvalues of a hypersurface of revolution

Denis Seliutskii, Russian Journal of Mathematical Physics 2025 Vol. 32 No. 2 P. 399–407

In this paper, we find an upper bound for the first Steklov eigenvalue for a surface of revolution with boundary consisting of two spheres of different radii. Moreover, we prove that, in some cases, this boundary is sharp. ...

Added: May 19, 2026

Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Association for Computational Linguistics, 2026.

Added: May 19, 2026

Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727

Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...

Added: May 19, 2026

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

Кондратьев С., Никитин Г. Э., Дырченкова Ю. А. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27

Added: May 19, 2026

On smooth Fano threefolds with coregularity zero

Жакупов О. Б., European Journal of Mathematics 2025 Vol. 11 Article 84

We provide examples of smooth three-dimensional Fano complete intersections of degree 2, 4, 6, and 8 that have absolute coregularity 0. Considering the main theorem of Avilov, Loginov, and Przyjalkowski (CNTP 18:506–577, 2024) on the remaining 101 families of smooth Fano threefolds, our result implies that each family of smooth Fano threefolds has an element of absolute ...

Added: May 18, 2026

Parallel Computational Technologies. PCT 2025

Springer, 2025.

This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...

Added: May 18, 2026

KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security

Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15

To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...

Added: May 16, 2026

DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data

Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66

Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...

Added: May 16, 2026

2-Elliptic Periodic Orbits near a Nonsimple Homoclinic Tangency in Four-Dimensional Symplectic Maps

Lerman L. M., Turaev D. V., Regular and Chaotic Dynamics 2026 Vol. 31 No. 3 P. 349–369

We show that bifurcations of four-dimensional symplectic diffeomorphisms with a quadratic homoclinic tangency to a saddle periodic orbit with real multipliers produce 2-elliptic periodic orbits if the tangency is not partially hyperbolic. We show that a normal form for the rescaled first-return maps near such tangency is given by a four-dimensional symplectic H´enonlike map and study bifurcations of the ...

Added: May 15, 2026

Bibliometric Analysis by Network Models

Aleskerov F. T., Yakuba V. I., Khutorskaya O. et al., Springer, 2026.

The book contains new models of bibliometric analysis based on centrality measures in network analysis, pattern analysis and stability analysis. A distinctive feature of these centrality measures is that they account for the parameters of vertices and group influence of vertices to a vertex. This reveals specific groups of publications, authors, terms, journals and affiliations ...

Added: May 15, 2026

Neural-network maps for two-parameter modeling of bistability and codimension-two bifurcations in two-dimensional flow dynamical systems

Kuptsov P., Panyushev A., Stankevich N., Chaos 2026 Vol. 36 No. 5 Article 053138

We develop a machine-learning approach to reproduce the behavior of two versions of the van der Pol oscillator exhibiting a subcritical Andronov–Hopf bifurcation, with or without a codimension-2 Bautin point. We construct a neural-network model that functions as a recur rent map and train it on short segments of oscillator trajectories. The results show that, ...

Added: May 15, 2026

Bifurcations and Structural Stability of Generic PC-HC Families

Dorovskiy A., / Series arXiv "math". 2026.

In this paper the structural stability of generic families of vector fields of the PC-HC class on the two-dimensional sphere is proved. A classification of these families up to moderate equivalence in neighborhoods of their large bifurcation supports is presented, based on such invariants as the configuration and the characteristic set. The realization lemma is proved. ...

Added: May 14, 2026

The Sobolev space W_2^{1/2}: Simultaneous improvement of functions by a homeomorphism of the circle

Lebedev V., Journal of Mathematical Analysis and Applications 2026 Vol. 563 No. 2 Article 130787

It is known that for every continuous real-valued function $f$ on the circle $\mathbb T=\mathbb R/2\pi\mathbb Z$ there exists a change of variable, i.e., a self-homeomorphism $h$ of $\mathbb T$, such that the superposition $f\circ h$ is in the Sobolev space $W_2^{1/2}(\mathbb T)$. We obtain new results on simultaneous improvement of functions by a single change of variable in relation ...

Added: May 14, 2026

О СЛОЖНОСТИ ПРОБЛЕМЫ ТОТАЛЬНОЙ ВЫВОДИМОСТИ В НЕУКОРАЧИВАЮЩИХ И КОНТЕКСТНО-СВОБОДНЫХ ГРАММАТИКАХ

Dudakov S., Карлов Б. Н., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 524 № 1 С. 11–18

In this paper we study the problem of total derivability in context-free, noncontracting, and context-sensitive grammars. Given a grammar and a terminal word, one has to determine whether there exists a derivation of this word which uses each production no less than a given number of times. It is proved that the problem of total ...

Added: March 18, 2026

О схлопывании вероятностных иерархий. I

Speranski S. O., Алгебра и логика 2013 Т. 52 № 2 С. 236–254

Изучаются иерархии проблем общезначимости для префиксных фрагментов вероятностной логики с кванторами по пропозициональным формулам, обозначаемой QPL, и её вариантов. Доказывается: если подполе F вещественных чисел определимо в стандартной модели арифметики посредством формулы второго порядка, не содержащей кванторов по множествам, то проблема общезначимости над F-значными вероятностными структурами для $\Sigma_4$-QPL-предложений является $\Pi^1_1$-полной и, как следствие, соответствующая иерархия проблем общезначимости схлопывается. Более того, при ...

Added: December 27, 2025

Некоторые классификации сложности задачи о вершинной 3-раскраске

Дахно Г. С., Malyshev D., Математические заметки 2026 Т. 119 № 3 С. 360–376

Наследственный класс — множество графов, замкнутое относительно удаления вершин. Каждый такой класс имеет каноническое описание посредством минимальных запрещенных порожденных фрагментов. Задача о вершинной 3-раскраске (задача 3-ВР) для заданного графа состоит в том, чтобы определить, а можно ли множество его вершин разбить на три подмножества попарно несмежных вершин. Известна дихотомия сложности этой задачи для всех наследственных ...

Added: November 26, 2025

NP-полнота игры “Ханаби” при минимальных параметрах

Onoprienko A., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 № 527 С. 206–216

We study the algorithmic complexity of the cooperative card game Hanabi. The feature of Hanabi is that players see each other’s cards but not their own, and exchange information through hints. Even in the model with one player who has full information about the deck, Hanabi remains NP-hard. We found the minimal parameters ofthe game ...

Added: November 23, 2025

Weighted mesh algorithms for general Markov decision processes: Convergence and tractability

Belomestny D., Schoenmakers J., Zorina V., Journal of Complexity 2025 Vol. 88 Article 101932

We introduce a mesh-type approach for tackling discrete-time, finite-horizon Markov Decision Processes (MDPs) characterized by state and action spaces that are general, encompassing both finite and infinite (yet suitably regular) subsets of Euclidean space. In particular, for bounded state and action spaces, our algorithm achieves a computational complexity that is tractable in the sense of ...

Added: November 10, 2025

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Timofei Gritsaev, Morozov N., Samsonov S. et al., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 95626–95646.

Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects with probabilities proportional to a given reward function. The key concept behind GFlowNets is the use of two stochastic policies: a forward policy, which incrementally constructs compositional objects, and a backward policy, which sequentially deconstructs them. Recent results show a ...

Added: August 15, 2025

Логики с аксиомой конвергентности: сложность при малом числе переменных в языке

Rybakov M., Щербаков М. И., В кн.: Четырнадцатые Смирновские чтения по логике: материалы Междунар. науч. конф., Москва, 19-21 июня 2025 г.: М.: Издатель Александр Воробьев, 2025. С. 46–49.

Логики с аксиомой конвергентности: сложность при малом числе переменных в языке ...

Added: June 21, 2025

Сложность константных фрагментов ненормальных модальных логик

Kudinov A., Rybakov M., В кн.: Четырнадцатые Смирновские чтения по логике: материалы Междунар. науч. конф., Москва, 19-21 июня 2025 г.: М.: Издатель Александр Воробьев, 2025. С. 36–39.

Показано, что каждая модальная логика, содержащая классическую логику высказываний и содержащаяся в слабой логике Гжегорчика, имеет NP-трудную проблему выполнимости для константного фрагмента. В частности, константные фрагменты ненормальных модальных логик E, EM, EN и EMN являются coNP-полными. ...

Added: June 21, 2025

VIA AI: Reliable Deep Reinforcement Learning for Traffic Signal Control

Герасёв М. С., Kiselev D., Beketov M. et al., , in: 2024 IEEE International Conference on Data Mining (ICDM) Workshops (ICDMW).: Curran Associates, 2024. P. 887–890.

Traffic signal control optimization is an integral part of any modern transportation system. However, modern traffic signal control systems often rely on predetermined fixed rules to adjust traffic signal timings. This paper presents VIA AI - an intelligent traffic signal control system that leverages deep reinforcement learning (RL) applied to count-based traffic data. Our solution ...

Added: March 27, 2025

The beer game bullwhip effect mitigation: a deep reinforcement learning approach

Rozhkov M., Alyamovskaya N., Zakhodiakin G., International Journal of Production Research 2025 Vol. 63 No. 18 P. 6630–6647

This article investigates the application of reinforcement learning (RL) methods to optimise a four-echelon linear supply chain model with stochastic demand. The proposed supply chain configuration is largely based on the production-distribution supply chain of the MIT Supply Chain Beer Game. We show that RL can significantly improve ordering efficiency and overall supply chain performance. ...

Added: March 24, 2025