Abstract | ||
---|---|---|
Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to non-convexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. |
Year | Venue | Keywords |
---|---|---|
2019 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | neural networks |
Field | DocType | Volume |
Sublinear function,Convergence (routing),Mathematical optimization,Temporal difference learning,Nonlinear system,Coupling,Bellman equation,Artificial neural network,Mathematics,Reinforcement learning | Journal | 32 |
ISSN | Citations | PageRank |
1049-5258 | 4 | 0.39 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qi Cai | 1 | 7 | 4.19 |
zhuoran yang | 2 | 52 | 29.86 |
Lee, Jason D. | 3 | 711 | 48.29 |
Zhaoran Wang | 4 | 157 | 33.20 |