Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity - Citegraph

Paper Info

Title
Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Abstract
Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an ϵ-stationary point with a sample complexity in the order of O(ϵ−3). Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample complexity that is in the order of O(ϵ−2). We further compare the performance of VR-Greedy-GQ with that of Greedy-GQ in various RL experiments to corroborate our theoretical findings.

Year	Venue	DocType
2021	ICLR	Conference
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shaocong Ma	1	0	1.69
Ziyi Chen	2	0	1.01
Yi Zhou	3	0	1.69
Shaofeng Zou	4	53	14.20

1