Interactive value iteration for Markov decision processes with unknown rewards - Citegraph

Paper Info

Title
Interactive value iteration for Markov decision processes with unknown rewards

Abstract
To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible reward functions for a given preference relation can be represented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the possible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upperbounded by a polynomial in the size of the problem, and we present experimental results which demonstrate that our approach is efficient in practice.

Year	Venue	Keywords
2013	IJCAI	value iteration,interactive value iteration,markov decision process,possible reward function,unknown reward,hard task,new approach,reward function,optimal policy,numeric value
DocType	Citations	PageRank
Conference	9	0.54
References	Authors
14	2

Authors (2 rows)

Cited by (9 rows)

References (14 rows)

Name	Order	Citations	PageRank
Paul Weng	1	157	16.39
Bruno Zanuttini	2	289	25.43

1