### Article

## A convex programming-based algorithm for mean payoff stochastic games with perfect information

We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph (Formula presented.), with local rewards (Formula presented.), and three types of positions: black (Formula presented.), white (Formula presented.), and random (Formula presented.) forming a partition of V. It is a long-standing open question whether a polynomial time algorithm for BWR-games exists, even when (Formula presented.). In fact, a pseudo-polynomial algorithm for BWR-games would already imply their polynomial solvability. In this short note, we show that BWR-games can be solved via convex programming in pseudo-polynomial time if the number of random positions is a constant.

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.

We suggest a new algorithm for two-person zero-sum undiscounted stochastic games focusing on stationary strategies. Given a positive real ϵϵ, let us call a stochastic game ϵϵ-ergodic, if its values from any two initial positions differ by at most ϵϵ. The proposed new algorithm outputs for every ϵ>0ϵ>0 in finite time either a pair of stationary strategies for the two players guaranteeing that the values from any initial positions are within an ϵϵ-range, or identifies two initial positions uu and vv and corresponding stationary strategies for the players proving that the game values starting from uu andvv are at least ϵ/24ϵ/24 apart. In particular, the above result shows that if a stochastic game is ϵϵ-ergodic, then there are stationary strategies for the players proving 24ϵ24ϵ-ergodicity. This result strengthens and provides a constructive version of an existential result by Vrieze (1980) claiming that if a stochastic game is 00-ergodic, then there are ϵϵ-optimal stationary strategies for every ϵ>0ϵ>0. The suggested algorithm extends the approach recently introduced for stochastic games with perfect information, and is based on the classical potential transformation technique that changes the range of local values at all positions without changing the normal form of the game.

We give an example of a three-person deterministic graphical game that has no Nash equilibrium in pure stationary strategies. The game has seven positions, four outcomes (a unique cycle and three terminal positions), and its normal form is of size 2×2×4 only. Thus, the example strengthens significantly the one obtained in 2014 by Gurvich and Oudalov; the latter has four players, five terminals, and normal form of size 2×4×6×8. Furthermore, our example is minimal with respect to the number of players. Somewhat similar examples were known since 1975, but they were not related to deterministic graphical games. The small size of our example allows us to verify that it has no Nash equilibrium not only in pure but also in independently mixed (so-called behavioral) strategies. For independently mixed strategies two distinct effective payoffs can be considered: along with the classical Markovian evaluation, we also consider *a priori* evaluation, which may be a better fit for playing in behavioral strategies. We show that in both cases Nash equilibria may fail to exist.

We study existence of Nash equilibria (NE) in pure stationary strategies in n-person positional games with no moves of chance, with perfect information, and with the mean or total effective cost function.

We construct a NE-free three-person game with positive local costs, thus disproving the conjecture suggested in Boros and Gurvich (2003). Still, the following four problems remain open: Whether NE exist in all *two*-person games with total effective costs such that

(I) all local costs are strictly positive *or* (II) there are no dicycles of the total cost zero?

Whether NE exist in all n-person games with the terminal (transition-free) cost functions, provided all dicycles form a unique outcome c and

(III) assuming that c is worse than any terminal outcome *or* (IV) without this assumption?

For n=3 the case (I) (and hence (II)) is answered in the negative. This is the main result of the present paper. For n=2 the case (IV) (and hence (III)) was answered in the positive earlier.

We briefly survey the above and some other negative and positive results on Nash-solvability in pure stationary strategies for the games under consideration.

We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G=(V,E), with local rewards r:E→Z, and three types of positions: black VB, white VW, and random VR forming a partition of *V*. It is a long-standing open question whether a polynomial time algorithm for BWR-games exists, or not, even when |VR|=0. In fact, a pseudo-polynomial algorithm for BWR-games would already imply their polynomial solvability. In this paper,1 we show that BWR-games with a constant number of random positions can be solved in pseudo-polynomial time. More precisely, in any BWR-game with |VR|=O(1), a saddle point in uniformly optimal pure stationary strategies can be found in time polynomial in |VW|+|VB|, the maximum absolute local reward, and the common denominator of the transition probabilities.