Abstract | ||
---|---|---|
To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible reward functions for a given preference relation can be represented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the possible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upperbounded by a polynomial in the size of the problem, and we present experimental results which demonstrate that our approach is efficient in practice. |
Year | Venue | Keywords |
---|---|---|
2013 | IJCAI | value iteration,interactive value iteration,markov decision process,possible reward function,unknown reward,hard task,new approach,reward function,optimal policy,numeric value |
DocType | Citations | PageRank |
Conference | 9 | 0.54 |
References | Authors | |
14 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Paul Weng | 1 | 157 | 16.39 |
Bruno Zanuttini | 2 | 289 | 25.43 |