Model-Ensemble Trust-Region Policy Optimization. - Citegraph

Paper Info

Title
Model-Ensemble Trust-Region Policy Optimization.

Abstract
Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. They tend to suffer from high sample complexity, however, which hinders their use in real-world domains. Alternatively, model-based reinforcement learning promises to reduce sample complexity, but tends to require careful tuning and to date have succeeded mainly in restrictive domains where simple models are sufficient for learning. In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. To overcome this issue, we propose to use an ensemble of models to maintain the model uncertainty and regularize the learning process. We further show that the use of likelihood ratio derivatives yields much more stable learning. Altogether, our approach Model-Ensemble Trust-Region Policy Optimization (ME-TRPO) significantly reduces the sample complexity compared to model-free deep RL methods on challenging continuous control benchmark tasks

Year	Venue	Field
2018	ICLR	Backpropagation through time,Trust region,Exploit,Artificial intelligence,Deep learning,Sample complexity,Deep neural networks,Machine learning,Mathematics,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1802.10592	18
PageRank	References	Authors
0.63	20	5

Authors (5 rows)

Cited by (18 rows)

References (20 rows)

Name	Order	Citations	PageRank
Thanard Kurutach	1	24	2.44
Ignasi Clavera	2	37	4.62
Yan Duan	3	775	27.97
Aviv Tamar	4	275	24.04
Pieter Abbeel	5	6363	376.48

1