### Book chapter

## Markov Decision Processes and Stochastic Games with Total Effective Payoff

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.

### In book

Contents of the book is divided into 2 parts of deterministic and stochastic models of Operations Research.

The first part of "Deterministic models of Operations Research" - is the base section, in which the emphasis is on linear programming.

The second part - "Stochastic models of Operations Research" includes a model of reliability and queuing models. This is original material.

The textbook can be useful to students of undergraduate and graduate programs in areas of training in "Applied Mathematics", "Applied Mathematics and Computer Science", "Information systems and technologies", as well as graduate students and science teachers who are interested in the problems of optimization in stochastic models

The paper discusses a new approach to developing tools for quantitatively analyzing the financial behavior of small and medium price-taking traders each possessing abilities to predict share price values for a set of financial securities traded in a stock exchange. Tools for forming and managing a trader’s portfolio of securities from this set are proposed. Particularly, it is shown that when the trader can treat share price values from the portfolio as random variables with known (to her) distributions, an optimal portfolio composition is found by solving a linear programming problem. Otherwise, this optimal composition is found as the trader’s equilibrium strategy in an antagonistic two-person game with the stock exchange being the other player. In this game on polyhedra of disjoint player strategies, described by systems of linear equations and inequalities of a balance kind, calculating saddle points is reduced to solving linear programming problems forming a dual pair.

The manual is devoted to the mathematical theory and methods of optimization applied to administrative decisions in economy. Volume 1 described approaches to mathematical modeling of management problems in economy and methods of mathematical programming tasks solution. Besides strict mathematical proofs, there are directing reasons, which is sometimes enough for understanding. There are many economic examples and exercises with detailed solutions. Readers are supposed to know the bases of the mathematical analysis and linear algebra, though necessary data from these courses in a concise form are provided in appendices.

World leading experts give their accounts of the modern mathematical models in the field: Markov Decision Processes, controlled diffusions, piece-wise deterministic processes etc, with a wide range of performance functionals. One of the aims is to give a general view on the state-of-the-art. The authors use Dynamic Programming, Convex Analytic Approach, several numerical methods, index-based approach and so on. Most chapters either contain well developed examples, or are entirely devoted to the application of the mathematical control theory to real life problems from such fields as Insurance, Portfolio Optimization and Information Transmission. The book will enable researchers, academics and research students to get a sense of novel results, concepts, models, methods, and applications of controlled stochastic processes.

The maximin of a function being the minimum function of a sum of two bilinear functions with one and the same first vector argument belonging to a polyhedron is considered on a polyhedron of connected variables forming two second vector arguments of the bilinear functions. It is shown that finding the exact lower estimate of this maximin is reducible to solving a quadratic programming problem.

The issue of using the MathCAD software package in a university educational course for learning to solve optimization problems is considered. The advantage of working with this program is shown and its main features are discussed in the appendix to this course.

A new statistical approach to alignment (finding the longest common subsequence) of two random RNA-type sequences is proposed. We have constructed a generalized ‘dynamic programming’ algorithm for finding the extreme value of the free energy of two noncoding RNAs. In our procedure, we take into account the binding free energy of two random heteropolymer chains which are capable of forming the cloverleaf-like spatial structures typical for RNA molecules. The algorithm is based on two observations: (i) the standard alignment problem can be considered as a zero-temperature limit of a more general statistical problem of binding of two associating heteropolymer chains; (ii) this last problem can be generalized naturally to consider sequences with hierarchical cloverleaf-like structures (i.e. of RNA type). The approach also permits us to perform a ‘secondary structure recovery’. Namely, we can predict the optimal secondary structures of interacting RNAs in a zero-temperature limit knowing only their primary sequences.