• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • Optimal Approximation of Average Reward Markov Decision Processes
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 22, 2026
HSE Graduates AI Project Wins at TECH & AI Awards
Daria Davydova, graduate of the HSE Graduate School of Business and Head of the AI Implementation Unit at the Artificial Intelligence Department of Alfa-Bank, received a prize at the TECH & AI Awards. She was awarded for the best AI solution for optimising business processes. The winners were determined as part of the VII Russian Summit and Awards on Digital Transformation (CDO/CDTO Summit & Awards).
May 20, 2026
HSE University Opens First Representative Office of Satellite Laboratory in Brazil
HSE University-St Petersburg opened a representative office of the Satellite Laboratory on Social Entrepreneurship at the University of Campinas in Brazil. The platform is going to unite research and educational projects in the spheres of sustainable development, communications and social innovations.
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Optimal Approximation of Average Reward Markov Decision Processes

Computational Mathematics and Mathematical Physics. 2025. Vol. 65. No. 3. P. 567–581.
Sapronov Y., Yudin N.

We continue to develop the concept of studying the ε-optimal policy for Average Reward Markov Decision Processes (AMDP) by reducing it to Discounted Markov Decision Processes (DMDP). Existing research often stipulates that the discount factor must not fall below a certain threshold. Typically, this threshold is close to one, and as is well-known, iterative methods used to find the optimal policy for DMDP become less effective as the discount factor approaches this value.

Our work distinguishes itself from existing studies by allowing for inaccuracies in solving the empirical Bellman equation. Despite this, we have managed to maintain the sample complexity that aligns with the latest results. We have succeeded in separating the contributions from the inaccuracy of approximating the transition matrix and the residuals in solving the Bellman equation in the upper estimate so that our findings enable us to determine the total complexity of the epsilon-optimal policy analysis for DMDP across any method with a theoretical foundation in iterative complexity.

Research target: Mathematics Computer Science
Language: English
DOI
Text on another site
Keywords: Markov Decision Processesвычислительная сложностьобучение с подкреплениемадгритмы и алгоритмическая сложностьразмер выборкиsample complexityreinforcement learning (RL)iteration complexityмарковские процессы принятия решений
Similar publications
Upper bounds for Steklov eigenvalues of a hypersurface of revolution
Denis Seliutskii, Russian Journal of Mathematical Physics 2025 Vol. 32 No. 2 P. 399–407
In this paper, we find an upper bound for the first Steklov eigenvalue for a surface of revolution with boundary consisting of two spheres of different radii. Moreover, we prove that, in some cases, this boundary is sharp. ...
Added: May 19, 2026
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Association for Computational Linguistics, 2026.
Added: May 19, 2026
Dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures
Bezzubov S., Malikov D., Krasnov L. et al., Scientific data 2026 Vol. 13 Article 727
Solubility is a crucial property of organic compounds, impacting their potential applications in synthetic chemistry, materials science and drug design. Moreover, in technological processes mixtures of solvents are often utilized, making the solubility assessment more complicated. Predicting solubility values in mixtures of solvents from a molecular structure can help to address this issue, although a ...
Added: May 19, 2026
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
Pikalov V., Meshcheryakov V., Kondratev S. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...
Added: May 19, 2026
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
Кондратьев С., Никитин Г. Э., Дырченкова Ю. А. et al., Technologies 2026 Vol. 14 No. 1 P. 1–27
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises ...
Added: May 19, 2026
On smooth Fano threefolds with coregularity zero
Жакупов О. Б., European Journal of Mathematics 2025 Vol. 11 Article 84
We provide examples of smooth three-dimensional Fano complete intersections of degree 2, 4, 6, and 8 that have absolute coregularity 0. Considering the main theorem of Avilov, Loginov, and Przyjalkowski (CNTP 18:506–577, 2024) on the remaining 101 families of smooth Fano threefolds, our result implies that each family of smooth Fano threefolds has an element of absolute ...
Added: May 18, 2026
Parallel Computational Technologies. PCT 2025
Springer, 2025.
This book constitutes the refereed proceedings of the 19th International Conference on Parallel Computational Technologies, PCT 2025, held in Moscow, Russia, during April 8–10, 2025. The 31 full papers included in this volume were carefully reviewed and selected from 122 submissions. These papers were organized under the following topical sections: High Performance Architectures, Tools and Technologies; ...
Added: May 18, 2026
KMHCR: A Key-Controlled Signal-Domain Transformation for 5G IoT Security
Ronglin Z., Wei L., Jiahong C. et al., Journal of Signal Processing Systems 2026 Vol. 98 P. 1–15
To address the need for lightweight and low-latency protection in massive resource-constrained 5G Internet of Things (IoT) systems, this paper proposes Key-Controlled Modulation Hopping and Constellation Rotation (KMHCR). KMHCR is designed as a physical-layer confidentiality-enhancement mechanism that avoids bit-wise full-payload encryption in the protection pipeline. It uses a shared key derived from channel-reciprocity secret key ...
Added: May 16, 2026
DPN Verifier: A Toolkit for Faster Soundness Verification and Repair of Process Models with Data
Suvorov N. M., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3(2) P. 49–66
Data Petri Nets (DPNs) extend classical Petri nets to model processes where data directly influences control-flow, enabling a comprehensive view of system behavior and possibility to detect failure points that could otherwise be hidden. Soundness is a correctness criterion that captures such failure points as deadlocks and livelocks as well as model boundedness and absence ...
Added: May 16, 2026
2-Elliptic Periodic Orbits near a Nonsimple Homoclinic Tangency in Four-Dimensional Symplectic Maps
Lerman L. M., Turaev D. V., Regular and Chaotic Dynamics 2026 Vol. 31 No. 3 P. 349–369
We show that bifurcations of four-dimensional symplectic diffeomorphisms with a quadratic homoclinic tangency to a saddle periodic orbit with real multipliers produce 2-elliptic periodic orbits if the tangency is not partially hyperbolic. We show that a normal form for the rescaled first-return maps near such tangency is given by a four-dimensional symplectic H´enonlike map and study bifurcations of the ...
Added: May 15, 2026
Bibliometric Analysis by Network Models
Aleskerov F. T., Yakuba V. I., Khutorskaya O. et al., Springer, 2026.
The book contains new models of bibliometric analysis based on centrality measures in network analysis, pattern analysis and stability analysis. A distinctive feature of these centrality measures is that they account for the parameters of vertices and group influence of vertices to a vertex. This reveals specific groups of publications, authors, terms, journals and affiliations ...
Added: May 15, 2026
Neural-network maps for two-parameter modeling of bistability and codimension-two bifurcations in two-dimensional flow dynamical systems
Kuptsov P., Panyushev A., Stankevich N., Chaos 2026 Vol. 36 No. 5 Article 053138
We develop a machine-learning approach to reproduce the behavior of two versions of the van der Pol oscillator exhibiting a subcritical Andronov–Hopf bifurcation, with or without a codimension-2 Bautin point. We construct a neural-network model that functions as a recur rent map and train it on short segments of oscillator trajectories. The results show that, ...
Added: May 15, 2026
Bifurcations and Structural Stability of Generic PC-HC Families
Dorovskiy A., / Series arXiv "math". 2026.
In this paper the structural stability of generic families of vector fields of the PC-HC class on the two-dimensional sphere is proved. A classification of these families up to moderate equivalence in neighborhoods of their large bifurcation supports is presented, based on such invariants as the configuration and the characteristic set. The realization lemma is proved. ...
Added: May 14, 2026
The Sobolev space W_2^{1/2}: Simultaneous improvement of functions by a homeomorphism of the circle
Lebedev V., Journal of Mathematical Analysis and Applications 2026 Vol. 563 No. 2 Article 130787
It is known that for every continuous real-valued  function $f$ on the circle $\mathbb T=\mathbb R/2\pi\mathbb Z$ there exists a  change of variable, i.e., a self-homeomorphism $h$ of $\mathbb T$, such that  the superposition $f\circ h$ is in the Sobolev space $W_2^{1/2}(\mathbb T)$.  We obtain new results on simultaneous improvement of functions by a single  change of variable in relation ...
Added: May 14, 2026
О СЛОЖНОСТИ ПРОБЛЕМЫ ТОТАЛЬНОЙ ВЫВОДИМОСТИ В НЕУКОРАЧИВАЮЩИХ И КОНТЕКСТНО-СВОБОДНЫХ ГРАММАТИКАХ
Dudakov S., Карлов Б. Н., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 Т. 524 № 1 С. 11–18
In this paper we study the problem of total derivability in context-free, noncontracting, and context-sensitive grammars. Given a grammar and a terminal word, one has to determine whether there exists a derivation of this word which uses each production no less than a given number of times. It is proved that the problem of total ...
Added: March 18, 2026
О схлопывании вероятностных иерархий. I
Speranski S. O., Алгебра и логика 2013 Т. 52 № 2 С. 236–254
Изучаются иерархии проблем общезначимости для префиксных фрагментов вероятностной логики с кванторами по пропозициональным формулам, обозначаемой QPL, и её вариантов. Доказывается: если подполе F вещественных чисел определимо в стандартной модели арифметики посредством формулы второго порядка, не содержащей кванторов по множествам, то проблема общезначимости над F-значными вероятностными структурами для $\Sigma_4$-QPL-предложений является $\Pi^1_1$-полной и, как следствие, соответствующая иерархия проблем общезначимости схлопывается. Более того, при ...
Added: December 27, 2025
Некоторые классификации сложности задачи о вершинной 3-раскраске
Дахно Г. С., Malyshev D., Математические заметки 2026 Т. 119 № 3 С. 360–376
Наследственный класс — множество графов, замкнутое относительно удаления вершин. Каждый такой класс имеет каноническое описание посредством минимальных запрещенных порожденных фрагментов. Задача о вершинной 3-раскраске (задача 3-ВР) для заданного графа состоит в том, чтобы определить, а можно ли множество его вершин разбить на три подмножества попарно несмежных вершин. Известна дихотомия сложности этой задачи для всех наследственных ...
Added: November 26, 2025
NP-полнота игры “Ханаби” при минимальных параметрах
Onoprienko A., Доклады Российской академии наук. Математика, информатика, процессы управления (ранее - Доклады Академии Наук. Математика) 2025 № 527 С. 206–216
We study the algorithmic complexity of the cooperative card game Hanabi. The feature of Hanabi is that players see each other’s cards but not their own, and exchange information through hints. Even in the model with one player who has full information about the deck, Hanabi remains NP-hard. We found the minimal parameters ofthe game ...
Added: November 23, 2025
Weighted mesh algorithms for general Markov decision processes: Convergence and tractability
Belomestny D., Schoenmakers J., Zorina V., Journal of Complexity 2025 Vol. 88 Article 101932
We introduce a mesh-type approach for tackling discrete-time, finite-horizon Markov Decision Processes (MDPs) characterized by state and action spaces that are general, encompassing both finite and infinite (yet suitably regular) subsets of Euclidean space. In particular, for bounded state and action spaces, our algorithm achieves a computational complexity that is tractable in the sense of ...
Added: November 10, 2025
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev, Morozov N., Samsonov S. et al., , in: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025).: ICLR, 2025. P. 95626–95646.
Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects with probabilities proportional to a given reward function. The key concept behind GFlowNets is the use of two stochastic policies: a forward policy, which incrementally constructs compositional objects, and a backward policy, which sequentially deconstructs them. Recent results show a ...
Added: August 15, 2025
Логики с аксиомой конвергентности: сложность при малом числе переменных в языке
Rybakov M., Щербаков М. И., В кн.: Четырнадцатые Смирновские чтения по логике: материалы Междунар. науч. конф., Москва, 19-21 июня 2025 г.: М.: Издатель Александр Воробьев, 2025. С. 46–49.
Логики с аксиомой конвергентности: сложность при малом числе переменных в языке ...
Added: June 21, 2025
Сложность константных фрагментов ненормальных модальных логик
Kudinov A., Rybakov M., В кн.: Четырнадцатые Смирновские чтения по логике: материалы Междунар. науч. конф., Москва, 19-21 июня 2025 г.: М.: Издатель Александр Воробьев, 2025. С. 36–39.
Показано, что каждая модальная логика, содержащая классическую логику высказываний и содержащаяся в слабой логике Гжегорчика, имеет NP-трудную проблему выполнимости для константного фрагмента. В частности, константные фрагменты ненормальных модальных логик E, EM, EN и EMN являются coNP-полными. ...
Added: June 21, 2025
VIA AI: Reliable Deep Reinforcement Learning for Traffic Signal Control
Герасёв М. С., Kiselev D., Beketov M. et al., , in: 2024 IEEE International Conference on Data Mining (ICDM) Workshops (ICDMW).: Curran Associates, 2024. P. 887–890.
Traffic signal control optimization is an integral part of any modern transportation system. However, modern traffic signal control systems often rely on predetermined fixed rules to adjust traffic signal timings. This paper presents VIA AI - an intelligent traffic signal control system that leverages deep reinforcement learning (RL) applied to count-based traffic data. Our solution ...
Added: March 27, 2025
The beer game bullwhip effect mitigation: a deep reinforcement learning approach
Rozhkov M., Alyamovskaya N., Zakhodiakin G., International Journal of Production Research 2025 Vol. 63 No. 18 P. 6630–6647
This article investigates the application of reinforcement learning (RL) methods to optimise a four-echelon linear supply chain model with stochastic demand. The proposed supply chain configuration is largely based on the production-distribution supply chain of the MIT Supply Chain Beer Game. We show that RL can significantly improve ordering efficiency and overall supply chain performance. ...
Added: March 24, 2025
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit