Abstract | ||
---|---|---|
Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a “fast” reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL^2, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose (“slow”) RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the “fast” RL algorithm on the current (previously unseen) MDP. We evaluate RL^2 experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL^2 is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL^2 on a vision-based navigation task and show that it scales up to high-dimensional problems. |
Year | Venue | Field |
---|---|---|
2016 | arXiv: Artificial Intelligence | Computer science,Markov decision process,Q-learning,Recurrent neural network,Artificial intelligence,Deep learning,Reinforcement learning algorithm,Error-driven learning,Machine learning,Learning classifier system,Reinforcement learning |
DocType | Volume | Citations |
Journal | abs/1611.02779 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yan Duan | 1 | 775 | 27.97 |
John Schulman | 2 | 1806 | 66.95 |
Xi Chen | 3 | 1649 | 54.94 |
Peter L. Bartlett | 4 | 5482 | 1039.97 |
Ilya Sutskever | 5 | 25814 | 1120.24 |
Pieter Abbeel | 6 | 6363 | 376.48 |