### Article

## A pseudo-polynomial algorithm for mean payoff stochastic games with perfect information and few random positions

We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G=(V,E), with local rewards r:E→Z, and three types of positions: black VB, white VW, and random VR forming a partition of *V*. It is a long-standing open question whether a polynomial time algorithm for BWR-games exists, or not, even when |VR|=0. In fact, a pseudo-polynomial algorithm for BWR-games would already imply their polynomial solvability. In this paper,1 we show that BWR-games with a constant number of random positions can be solved in pseudo-polynomial time. More precisely, in any BWR-game with |VR|=O(1), a saddle point in uniformly optimal pure stationary strategies can be found in time polynomial in |VW|+|VB|, the maximum absolute local reward, and the common denominator of the transition probabilities.

For matrix games we study how small nonzero probability must be used in optimal strategies. We show that for n×n win–lose–draw games (i.e. (−1,0,1) matrix games) nonzero probabilities smaller than n−O(n)are never needed. We also construct an explicit n×n win–lose game such that the unique optimal strategy uses a nonzero probability as small as n−Ω(n). This is done by constructing an explicit (−1,1)nonsingular n×n matrix, for which the inverse has only nonnegative entries and where some of the entries are of value nΩ(n).

We study existence of Nash equilibria (NE) in pure stationary strategies in n-person positional games with no moves of chance, with perfect information, and with the mean or total effective cost function.

We construct a NE-free three-person game with positive local costs, thus disproving the conjecture suggested in Boros and Gurvich (2003). Still, the following four problems remain open: Whether NE exist in all *two*-person games with total effective costs such that

(I) all local costs are strictly positive *or* (II) there are no dicycles of the total cost zero?

Whether NE exist in all n-person games with the terminal (transition-free) cost functions, provided all dicycles form a unique outcome c and

(III) assuming that c is worse than any terminal outcome *or* (IV) without this assumption?

For n=3 the case (I) (and hence (II)) is answered in the negative. This is the main result of the present paper. For n=2 the case (IV) (and hence (III)) was answered in the positive earlier.

We briefly survey the above and some other negative and positive results on Nash-solvability in pure stationary strategies for the games under consideration.

We suggest a new algorithm for two-person zero-sum undiscounted stochastic games focusing on stationary strategies. Given a positive real ϵϵ, let us call a stochastic game ϵϵ-ergodic, if its values from any two initial positions differ by at most ϵϵ. The proposed new algorithm outputs for every ϵ>0ϵ>0 in finite time either a pair of stationary strategies for the two players guaranteeing that the values from any initial positions are within an ϵϵ-range, or identifies two initial positions uu and vv and corresponding stationary strategies for the players proving that the game values starting from uu andvv are at least ϵ/24ϵ/24 apart. In particular, the above result shows that if a stochastic game is ϵϵ-ergodic, then there are stationary strategies for the players proving 24ϵ24ϵ-ergodicity. This result strengthens and provides a constructive version of an existential result by Vrieze (1980) claiming that if a stochastic game is 00-ergodic, then there are ϵϵ-optimal stationary strategies for every ϵ>0ϵ>0. The suggested algorithm extends the approach recently introduced for stochastic games with perfect information, and is based on the classical potential transformation technique that changes the range of local values at all positions without changing the normal form of the game.

Supposing that Player 1’s computational power is higher than that of Player 2, we give three examples of different kinds of public signal about the state of a two-person zero-sum game with symmetric incom- plete information on both sides (both players do not know the state of the game) where Player 1 due to his computational power learns the state of the game meanwhile it is impossible for Player 2. That is, the game with incomplete information on both sides becomes a game with incomplete information on the side of Player 2. Thus we demonstrate that information about the state of a game may appear not only due to a private signal but as a result of a public signal and asymmetric computational resources of players.

We consider Gillette’s two-person zero-sum stochastic games with perfect information. For each k∈N={0,1,…}k∈N={0,1,…} we introduce an effective reward function, called k-total. For k=0k=0 and 1 this function is known as mean payoff and total reward, respectively. We restrict our attention to the deterministic case. For all k, we prove the existence of a saddle point which can be realized by uniformly optimal pure stationary strategies. We also demonstrate that k-total reward games can be embedded into (k+1)(k+1)-total reward games.

We consider certain spaces of functions on the circle, which naturally appear in harmonic analysis, and superposition operators on these spaces. We study the following question: which functions have the property that each their superposition with a homeomorphism of the circle belongs to a given space? We also study the multidimensional case.

We consider the spaces of functions on the m-dimensional torus, whose Fourier transform is p -summable. We obtain estimates for the norms of the exponential functions deformed by a C1 -smooth phase. The results generalize to the multidimensional case the one-dimensional results obtained by the author earlier in “Quantitative estimates in the Beurling—Helson theorem”, Sbornik: Mathematics, 201:12 (2010), 1811 – 1836.

We consider the spaces of function on the circle whose Fourier transform is p-summable. We obtain estimates for the norms of exponential functions deformed by a C1 -smooth phase.