Divergence-Augmented Policy Optimization - Citegraph

Paper Info

Title
Divergence-Augmented Policy Optimization

Abstract
In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

Year	Venue	Keywords
2019	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019)	bregman divergence,premature convergence
DocType	Volume	ISSN
Conference	32	1049-5258
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Qing Wang	1	345	76.64
Yingru Li	2	0	1.01
Jiechao Xiong	3	42	6.14
Zhang, Tong	4	7126	611.43

1