Neural Temporal-Difference Learning Converges to Global Optima. - Citegraph

Paper Info

Title
Neural Temporal-Difference Learning Converges to Global Optima.

Abstract
Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to non-convexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD.

Year	Venue	Keywords
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	neural networks
Field	DocType	Volume
Sublinear function,Convergence (routing),Mathematical optimization,Temporal difference learning,Nonlinear system,Coupling,Bellman equation,Artificial neural network,Mathematics,Reinforcement learning	Journal	32
ISSN	Citations	PageRank
1049-5258	4	0.39
References	Authors
0	4

Authors (4 rows)

Cited by (4 rows)

References (0 rows)

Name	Order	Citations	PageRank
Qi Cai	1	7	4.19
zhuoran yang	2	52	29.86
Lee, Jason D.	3	711	48.29
Zhaoran Wang	4	157	33.20

1