Abstract | ||
---|---|---|
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes the maximal $n$-step payoff by iterating $n$ times a recurrence equation which is naturally associated to the MDP. At the same time, value iteration provides a policy for the MDP that is optimal on a given finite horizon $n$. In this paper, we settle the computational complexity of value iteration. We show that, given a horizon $n$ in binary and an MDP, computing an optimal policy is EXP-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis. As a stepping stone, we show that it is EXP-complete to compute the $n$-fold iteration (with $n$ in binary) of a function given by a straight-line program over the integers with $max$ and $+$ as operators. |
Year | Venue | Field |
---|---|---|
2019 | international colloquium on automata, languages and programming | Integer,Discrete mathematics,Open problem,Markov decision process,Operator (computer programming),Finite horizon,Mathematics,Computational complexity theory,Stochastic game,Binary number |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nikhil Balaji | 1 | 9 | 4.24 |
Stefan Kiefer | 2 | 345 | 36.87 |
Petr Novotný | 3 | 46 | 3.35 |
Guillermo A. Pérez | 4 | 24 | 3.52 |
Mahsa Shirmohammadi | 5 | 33 | 10.70 |