Abstract | ||
---|---|---|
Adversarial methods for imitation learning have been shown to perform well on various control tasks. However, they require a large number of environment interactions for convergence. In this paper, we propose an end-to-end differentiable adversarial imitation learning algorithm in a Dyna-like framework for switching between model-based planning and model-free learning from expert data. Our results on both discrete and continuous environments show that our approach of using model-based planning along with model-free learning converges to an optimal policy with fewer number of environment interactions in comparison to the state-of-the-art learning methods. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Learning | Journal |
Volume | Citations | PageRank |
abs/1903.03234 | 0 | 0.34 |
References | Authors | |
9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vaibhav Saxena | 1 | 0 | 1.01 |
Srinivasan Sivanandan | 2 | 0 | 0.68 |
Pulkit Mathur | 3 | 0 | 0.34 |