Abstract | ||
---|---|---|
We propose Generative Adversarial Tree Search (GATS), a sample-efficient Deep Reinforcement Learning (DRL) algorithm. While Monte Carlo Tree Search (MCTS) is known to be effective for search and planning in RL, it is often sample-inefficient and therefore expensive to apply in practice. In this work, we develop a Generative Adversarial Network (GAN) architecture to model an environmentu0027s dynamics and a predictor model for the reward function. We exploit collected data from interaction with the environment to learn these models, which we then use for model-based planning. During planning, we deploy a finite depth MCTS, using the learned model for tree search and a learned Q-value for the leaves, to find the best action. We theoretically show that GATS improves the bias-variance trade-off in value-based DRL. Moreover, we show that the generative model learns the model dynamics using orders of magnitude fewer samples than the Q-learner. In non-stationary settings where the environment model changes, we find the generative model adapts significantly faster than the Q-learner to the new environment. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Learning | Computer science,Artificial intelligence,Generative grammar,Adversarial system |
DocType | Volume | Citations |
Journal | abs/1806.05780 | 2 |
PageRank | References | Authors |
0.36 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kamyar Azizzadenesheli | 1 | 2 | 3.07 |
Brandon Yang | 2 | 65 | 3.08 |
Weitang Liu | 3 | 18 | 1.54 |
Emma Brunskill | 4 | 673 | 90.33 |
Zachary Chase Lipton | 5 | 534 | 45.49 |
Animashree Anandkumar | 6 | 1629 | 116.30 |