Proximal Gradient Temporal Difference Learning Algorithms. - Citegraph

Paper Info

Title
Proximal Gradient Temporal Difference Learning Algorithms.

Abstract
In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to primal-dual saddle-point objective functions. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and no finite-sample analysis had been attempted. An accelerated algorithm is also proposed, namely GTD2-MP, which use proximal \"mirror maps\" to yield acceleration. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

Year	Venue	Field
2016	IJCAI	Convergence (routing),Saddle,Least squares,Temporal difference learning,Computer science,Algorithm,Proximal Gradient Methods,Artificial intelligence,Acceleration,Stochastic approximation,Machine learning,Reinforcement learning
DocType	Citations	PageRank
Conference	1	0.37
References	Authors
12	5

Authors (5 rows)

Cited by (1 rows)

References (12 rows)

Name	Order	Citations	PageRank
Bo Liu	1	521	84.67
Ji Liu	2	1352	77.54
Mohammad Ghavamzadeh	3	814	67.73
Sridhar Mahadevan	4	1965	240.81
Marek Petrik	5	211	24.23

1