Title
A First Empirical Study of Emphatic Temporal Difference Learning.
Abstract
In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under emph{off}-policy training (Sutton, Mahmood u0026 White 2016), but it is also a new algorithm for the emph{on}-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of bounce. In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower.
Year
Venue
Field
2017
arXiv: Artificial Intelligence
Convergence (routing),Temporal difference learning,Asymptote,Computer science,Artificial intelligence,Empirical research
DocType
Volume
Citations 
Journal
abs/1705.04185
1
PageRank 
References 
Authors
0.39
4
3
Name
Order
Citations
PageRank
Sina Ghiassian142.49
Banafsheh Rafiee240.80
Richard S. Sutton361001436.83