Markov Decision Processes and Stochastic Games with Total Effective Payoff

V. Gurvich; Boros E.; Elbassioni K.; Makino K.

doi:10.4230/LIPIcs.STACS.2015.103

Publications

?

Markov Decision Processes and Stochastic Games with Total Effective Payoff

P. 103–115.

Gurvich V., Boros E., Elbassioni K., Makino K.

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.

Keywords: Markov Decision Processes linear programming undiscounted stochastic games mean payoff total payoff

In book

32nd International Symposium on Theoretical Aspects of Computer Science (STACS 2015), Leibniz International Proceedings in Informatics (LIPIcs)

Vol. 30. , Dagstuhl: Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2015.

Weighted mesh algorithms for general Markov decision processes: Convergence and tractability

Belomestny D., Schoenmakers J., Zorina V., Journal of Complexity 2025 Vol. 88 Article 101932

We introduce a mesh-type approach for tackling discrete-time, finite-horizon Markov Decision Processes (MDPs) characterized by state and action spaces that are general, encompassing both finite and infinite (yet suitably regular) subsets of Euclidean space. In particular, for bounded state and action spaces, our algorithm achieves a computational complexity that is tractable in the sense of ...

Added: November 10, 2025

Optimal Approximation of Average Reward Markov Decision Processes

Sapronov Y., Yudin N., Computational Mathematics and Mathematical Physics 2025 Vol. 65 No. 3 P. 567–581

We continue to develop the concept of studying the ε-optimal policy for Average Reward Markov Decision Processes (AMDP) by reducing it to Discounted Markov Decision Processes (DMDP). Existing research often stipulates that the discount factor must not fall below a certain threshold. Typically, this threshold is close to one, and as is well-known, iterative methods ...

Added: June 10, 2025

On the problem of optimal fair exchange

Kolesnikov A., Popova S., / Series arXiv "math". 2024.

We consider the problem of optimal exchange which can be formulated as a kind of optimal transportation problem. The existence of an optimal solution and a duality theorem for the optimal exchange problem are proved in case of completely regular topological spaces. We show the connection between the problem of optimal exchange and the optimal ...

Added: December 20, 2024

Обзор выпуклой оптимизации марковских процессов принятия решений

Rudenko V., Yudin N., Васин А. А., Компьютерные исследования и моделирование 2023 Т. 15 № 2 С. 329–353

This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, ...

Added: November 29, 2024

Algorithm for Constrained Markov Decision Process with Linear Convergence

Gladin E., Lavrik-Karmazin M., Zainullina K. et al., Proceedings of Machine Learning Research 2023 Vol. 206 P. 11506–11533

The problem of constrained Markov decision process is considered. An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs (the number of constraints is relatively small). A new dual approach is proposed with the integration of two ingredients: entropy-regularized policy optimizer and Vaidya’s dual optimizer, both of which ...

Added: November 6, 2024

On a simple connection between Δ-modular ILP and LP, and a new bound on the number of integer vertices

Gribanov D., Malyshev D., Shumilov I., Operations Research Forum 2024 Vol. 5 Article 32

In our note, we present a very simple and short proof of a new interesting fact about the faces of an integer hull of a given rational polyhedron. This fact has a complete analog in linear programming theory and can be useful to establish new constructive upper bounds on the number of vertices in an integer hull of ...

Added: April 4, 2024

Exact Algorithm for Generating H-Cores in Simplified Lattice-Based Protein Model

Ignatov A., , in: 14th International Conference, OPTIMA 2023, Petrovac, Montenegro, September 18–22, 2023, Revised Selected Papers. Communications in Computer and Information Science (CCIS, volume 1913)Vol. 1913.: Springer, 2023. P. 173–187.

Modeling protein folding, which is the process by which a protein obtains its spacial shape, still remains a challenging problem. Protein geometry might be simplified by using the coarse-grained models. The highest level of simplification is achieved in HP-models where only polarity of amino acid residues is considered, and the unified monomers are located in nodes ...

Added: January 18, 2024

О безопасности схемы биометрической аутентификации, основанной на нейронной сети

Маршалко Г. Б., Математические вопросы криптографии 2014 Vol. 5 No. 2 P. 87–98

We show that neuron weights used in neural network-based biometric authentication scheme defined in GOST R 52633 standard series contain all the information on biometric data and secret key of the legitimate user. So, the complexity of evaluating (with known tables of neuron weights) the legitimate user's secret key is equivalent to the complexity of evaluating ...

Added: October 7, 2022

A Constrained Shortest Path Scheme for Virtual Network Service Management

Chemodanov D., Esposito F., Calyam P. et al., IEEE Transactions on Network and Service Management 2019 Vol. 16 No. 1 P. 127–142

Virtual network services that span multiple data centers are important to support emerging data-intensive applications in fields such as bioinformatics and retail analytics. Successful virtual network service composition and maintenance requires flexible and scalable “constrained shortest path management” both in the management plane for virtual network embedding (VNE) or network function virtualization service chaining (NFV-SC), ...

Added: December 3, 2019

Минимаксная оптимизация в задачах схемотехнического проектирования

Гурарий М. М., Жаров М. М., Русаков С. Г. et al., Информационные технологии 2018 Т. 24 № 7 С. 435–444

The directions of improvement of minimax methods for circuit design problems are considered. The choices of generalized quality criterion for the circuit design is discussed. It is concluded that the minimax criterion has advantages over other formulations of design targets. New approach to setting of individual objectives for each performance indicator is proposed. The approach ...

Added: February 12, 2019

A Potential Reduction Algorithm for Two-Person Zero-Sum Mean Payoff Stochastic Games

Gurvich V., Boros E., Elbassioni K. et al., Dynamic Games and Applications 2018 Vol. 8 No. 1 P. 22–41

We suggest a new algorithm for two-person zero-sum undiscounted stochastic games focusing on stationary strategies. Given a positive real εε, let us call a stochastic game εε-ergodic, if its values from any two initial positions differ by at most εε. The proposed new algorithm outputs for every ε>0ε>0 in finite time either a pair of stationary strategies for the two ...

Added: October 10, 2018

Evaluating Typical Algorithms of Combinatorial Optimization to Solve Continuous-Time Based Scheduling Problem

Lazarev A. A., Pravdivets N., Nekrasov I., Algorithms 2018 Vol. 11 No. 4 P. 1–13

We consider one approach to formalize the Resource-Constrained Project Scheduling Problem (RCPSP) in terms of combinatorial optimization theory. The transformation of the original problem into combinatorial setting is based on interpreting each operation as an atomic entity that has a defined duration and has to be resided on the continuous time axis meeting additional restrictions. ...

Added: October 1, 2018

Применение программного пакета MathCAD для обучения решению задач линейного программирования

Mikheev A. V., В кн.: Современное образование: содержание, технологии, качество. Материалы XXIV международной научно-методической конференции.Т. 2.: СПб.: Издательство СПбГЭТУ "ЛЭТИ", 2018. С. 55–56.

The issue of using the MathCAD software package in a university educational course for learning to solve optimization problems is considered. The advantage of working with this program is shown and its main features are discussed in the appendix to this course. ...

Added: April 24, 2018

A convex programming-based algorithm for mean payoff stochastic games with perfect information

Boros E., Elbassioni K., Gurvich V. et al., Optimization Letters 2017 Vol. 11 No. 8 P. 1499–1512

We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph (Formula presented.), with local rewards (Formula presented.), and three types of positions: black (Formula presented.), white (Formula presented.), and random (Formula presented.) forming a partition of V. It is a long-standing open question whether a polynomial time algorithm ...

Added: May 18, 2017

Исследование операций. Линейное программирование и стохастические модели.

Kashtanov V., Зайцева О. Б., М.: КУРС: ИНФРА-М, 2016.

Contents of the book is divided into 2 parts of deterministic and stochastic models of Operations Research. The first part of "Deterministic models of Operations Research" - is the base section, in which the emphasis is on linear programming. The second part - "Stochastic models of Operations Research" includes a model of reliability and queuing models. This ...

Added: November 13, 2016

Finding the exact lower estimate of the maximin of a minimum function on a polyhedron of connected variables

Belenky A., Applied Mathematics Letters 2007 Vol. 20 No. 7 P. 795–799

The maximin of a function being the minimum function of a sum of two bilinear functions with one and the same first vector argument belonging to a polyhedron is considered on a polyhedron of connected variables forming two second vector arguments of the bilinear functions. It is shown that finding the exact lower estimate of ...

Added: October 21, 2016

Two approaches to modeling the interaction of small and medium price-taking traders with a stock exchange by mathematical programming techniques

Belenky A., Egorova L., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2016. No. 02.

The paper presents two new approaches to modeling the interaction of small and medium price-taking traders with a stock exchange. In the framework of these approaches, the traders can form and manage their portfolios of financial instruments traded on a stock exchange with the use of linear, integer, and mixed programming techniques. Unlike previous authors’ ...

Added: June 27, 2016

An Approach to Forming and Managing a Portfolio of Financial Securities by Small and Medium Price-Taking Traders in a Stock Exchange

Belenky A., Egorova L., , in: Advances in Intelligent Systems and ComputingIssue 359: Modelling, Computation and Optimization in Information Systems and Management Sciences.: Switzerland: Springer, 2015. P. 257–268.

The paper discusses a new approach to developing tools for quantitatively analyzing the financial behavior of small and medium price-taking traders each possessing abilities to predict share price values for a set of financial securities traded in a stock exchange. Tools for forming and managing a trader’s portfolio of securities from this set are proposed. ...

Added: June 1, 2015

Две модели принятия решений участником торгов на фондовой бирже по формированию и изменению своего инвестиционного портфеля

Belenky A., Egorova L., / Series WP7 "Математические методы анализа решений в экономике, бизнесе и политике". 2015. No. WP7/2015/02.

Two mathematical models formalizing the decision-making process by a trader on developing and changing her investment portfolio in a stock exchange are presented. According to the first model the trader can correctly predict future values of financial securities of her interest. In this case, the problem of finding optimal strategies of investing in these financial ...

Added: May 31, 2015

Finding Optimal Production and Selling Strategies for an Electricity Generator in a Part of a Country’s Electrical Grid

Belenky A., Procedia Computer Science 2014 Vol. 31 P. 1150–1159

A part of a country’s electrical grid in which an electricity generator (which may consist of several base load power plants and several peaking power plants) supplies electricity to a set of large customers of the grid, whereas the customers can a) receive electricity from renewable sources of energy, b) store electricity in certain volumes, and c) ...

Added: September 29, 2014