Abstract | ||
---|---|---|
In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy variations of the Mountain Car problem. The initial motivation for developing ETD was that it has good convergence properties under emph{off}-policy training (Sutton, Mahmood u0026 White 2016), but it is also a new algorithm for the emph{on}-policy case. In both our on-policy and off-policy experiments, we found that each method converged to a characteristic asymptotic level of error, with ETD better than TD(0). TD(0) achieved a still lower error level temporarily before falling back to its higher asymptote, whereas ETD never showed this kind of bounce. In the off-policy case (in which TD(0) is not guaranteed to converge), ETD was significantly slower. |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Artificial Intelligence | Convergence (routing),Temporal difference learning,Asymptote,Computer science,Artificial intelligence,Empirical research |
DocType | Volume | Citations |
Journal | abs/1705.04185 | 1 |
PageRank | References | Authors |
0.39 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sina Ghiassian | 1 | 4 | 2.49 |
Banafsheh Rafiee | 2 | 4 | 0.80 |
Richard S. Sutton | 3 | 6100 | 1436.83 |