Title
Interactive value iteration for Markov decision processes with unknown rewards
Abstract
To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible reward functions for a given preference relation can be represented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the possible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upperbounded by a polynomial in the size of the problem, and we present experimental results which demonstrate that our approach is efficient in practice.
Year
Venue
Keywords
2013
IJCAI
value iteration,interactive value iteration,markov decision process,possible reward function,unknown reward,hard task,new approach,reward function,optimal policy,numeric value
DocType
Citations 
PageRank 
Conference
9
0.54
References 
Authors
14
2
Name
Order
Citations
PageRank
Paul Weng115716.39
Bruno Zanuttini228925.43