Policy Optimization with Stochastic Mirror Descent. - Citegraph

Paper Info

Title
Policy Optimization with Stochastic Mirror Descent.

Abstract
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic mirror descent. In VRMPO, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed VRMPO needs only O(ε−3) sample trajectories to achieve an ε-approximate first-order stationary point, which matches the best sample complexity for policy optimization. Extensive empirical results demonstrate that VRMP outperforms the state-of-the-art policy gradient methods in various settings.

Year	Venue	Keywords
2022	AAAI Conference on Artificial Intelligence	Machine Learning (ML)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Long Yang	1	2	2.08
Yu Zhang	2	0	1.01
Gang Zheng	3	0	1.35
Qian Zheng	4	0	0.68
Pengfei Li	5	3	1.71
Jianhang Huang	6	1	0.73
Gang Pan	7	0	1.35

1