Bounding Performance Loss in Approximate MDP Homomorphisms - Citegraph

Paper Info

Title
Bounding Performance Loss in Approximate MDP Homomorphisms

Abstract
We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the differ- ence in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than pr evious bounds pro- vided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

Year	Venue	Keywords
2008	NIPS	upper bound,markov decision process
Field	DocType	Citations
Kernel (linear algebra),Discrete mathematics,Mathematical optimization,Computer science,Markov decision process,Bellman equation,Heuristics,Homomorphism,Bisimulation,Bounding overwatch	Conference	15
PageRank	References	Authors
0.83	11	3

Authors (3 rows)

Cited by (15 rows)

References (11 rows)

Name	Order	Citations	PageRank
Jonathan Taylor	1	103	4.93
Doina Precup	2	2829	221.83
Prakash Panangaden	3	2248	188.43

1