Title
Bounding Performance Loss in Approximate MDP Homomorphisms
Abstract
We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the differ- ence in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than pr evious bounds pro- vided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.
Year
Venue
Keywords
2008
NIPS
upper bound,markov decision process
Field
DocType
Citations 
Kernel (linear algebra),Discrete mathematics,Mathematical optimization,Computer science,Markov decision process,Bellman equation,Heuristics,Homomorphism,Bisimulation,Bounding overwatch
Conference
15
PageRank 
References 
Authors
0.83
11
3
Name
Order
Citations
PageRank
Jonathan Taylor11034.93
Doina Precup22829221.83
Prakash Panangaden32248188.43