ProMP: Proximal Meta-Policy Search. - Citegraph

Paper Info

Title
ProMP: Proximal Meta-Policy Search.

Abstract
Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.

Year	Venue	Field
2018	international conference on learning representations	Credit assignment,Mathematical optimization,Neglect,Artificial intelligence,Statistical distance,Mathematics,Machine learning
DocType	Volume	Citations
Journal	abs/1810.06784	2
PageRank	References	Authors
0.36	11	5

Authors (5 rows)

Cited by (2 rows)

References (11 rows)

Name	Order	Citations	PageRank
Jonas Rothfuss	1	6	2.45
Dennis Lee	2	22	1.63
Ignasi Clavera	3	37	4.62
tamim asfour	4	1889	151.86
Pieter Abbeel	5	6363	376.48

1