Séance Séminaire

Séminaire ACSIOM

mardi 08 avril 2008 à 10:00 - salle 431
Jérôme Renault (Université Toulouse I)

Sur l'existence de la valeur uniforme en programmation dynamique et dans les processus de décision markoviens

We consider dynamic programming problems and give sufficient conditions for the existence of the uniform value. As a consequence, we obtain the existence of the uniform value when : the state space is compact metric, payoffs are continuous and the transition correspondence is non expansive. We also apply our results to Markov Decision Processes and obtain a few generalizations. More precisely, we consider dynamic programming problems with infinite time horizon, given by a state space Z, a correspondence from Z to Z, a bounded reward function defined on Z, and an initial state. We denote, for each $n$, the value of the $n$-stage problem with average payoffs by $v_n$. By definition, the uniform value exists if $(v_n)_n$ converges to some limit $v$, and for each $\varepsilon > 0$ there exists a play giving a payoff not lower than $v-\varepsilon$ in any sufficiently long $n$-stage problem. So when the uniform value exists, a decision maker can play $\varepsilon$-optimally simultaneously in any long enough problem. Monderer and Sorin (1993), and Lehrer and Monderer (1994) answered a question raised by Mertens (1987) and showed that the uniform convergence of $(v_n)_n$ (or/and of the discounted value $(v_{\lambda})_{\lambda})$ was not enough to ensure the existence of the uniform value. We define, for every $m$ and $n$, a value $w_{m,n}$ as the supremum payoff the decision maker can achieve when his payoff is defined as the minimum, for $t=1,...,n$, of his average rewards computed between stages $m + 1$ and $m + t$. Our main result states that if the set of states is precompact metric and the family $(w_{m,n})_{m,n}$ is uniformly equicontinuous, then the uniform value exists. Moreover, it is equal to $\sup_{m\geq 0} \inf_{n\geq 1} w_{m,n}(z) = \inf_{n\geq 1} \sup_{m\geq 0} w_{m,n}(z)$, and other relations are obtained. As an application, we consider probabilistic MDP (Markov Decision Processes) and show : 1) in a standard MDP with finite set of states and arbitrary set of actions, the uniform value exists, and 2) if the decision maker can randomly select his actions, the same result also holds when there is imperfect observation of the state (generalization of Rosenberg Solan Vieille 2002). Finally our result can also be used to prove the existence of the uniform value in a special class of stochastic games, which leads to the existence of the value in repeated games with an informed controller.