Gradient Information Matters in Policy Optimization by Back-propagating through Model - Citegraph

Paper Info

Title
Gradient Information Matters in Policy Optimization by Back-propagating through Model

Abstract
Model-based reinforcement learning provides an efficient mechanism to find the optimal policy by interacting with the learned environment. In addition to treating the learned environment like a black-box simulator, a more effective way to use the model is to exploit its differentiability. Such methods require the gradient information of the learned environment model when calculating the policy gradient. However, since the error of gradient is not considered in the model learning phase, there is no guarantee for the model's accuracy. To address this problem, we first analyze the convergence rate for the policy optimization methods when the policy gradient is calculated using the learned environment model. The theoretical results show that the model gradient error matters in the policy optimization phrase. Then we propose a two-model-based learning method to control the prediction error and the gradient error. We separate the different roles of these two models at the model learning phase and coordinate them at the policy optimization phase. After proposing the method, we introduce the directional derivative projection policy optimization (DDPPO) algorithm as a practical implementation to find the optimal policy. Finally, we empirically demonstrate the proposed algorithm has better sample efficiency when achieving a comparable or better performance on benchmark continuous control tasks.

Year	Venue	Keywords
2022	International Conference on Learning Representations (ICLR)	Model-based RL,Policy Optimization
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chongchong Li	1	0	0.34
Wang, Yue	2	10	1.39
Wei Chen	3	166	14.55
Liu, Yuting	4	2	1.04
Zhi-Ming Ma	5	227	18.26
Tie-yan Liu	6	4662	256.32

1