Title | ||
---|---|---|
Switch-Based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning |
Abstract | ||
---|---|---|
Training task-completion dialogue agents with reinforcement learning usually requires a large number of real user experiences. The Dyna-Q algorithm extends Q-learning by integrating a world model, and thus can effectively boost training efficiency using simulated experiences generated by the world model. The effectiveness of Dyna-Q, however, depends on the quality of the world model - or implicitly, the pre-specified ratio of real vs. simulated experiences used for Q-learning. To this end, we extend the recently proposed Deep Dyna-Q (DDQ) framework by integrating a switcher that automatically determines whether to use a real or simulated experience for Q-learning. Furthermore, we explore the use of active learning for improving sample efficiency, by encouraging the world model to generate simulated experiences in the state-action space where the agent has not (fully) explored. Our results show that by combining switcher and active learning, the new framework named as Switch-based Active Deep Dyna-Q (Switch-DDQ), leads to significant improvement over DDQ and Q-learning baselines in both simulation and human evaluations.(1) |
Year | Venue | Field |
---|---|---|
2018 | THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | Active learning,Policy learning,Computer science,Baseline (configuration management),Artificial intelligence,Task completion,Machine learning,Reinforcement learning |
DocType | Volume | Citations |
Journal | abs/1811.07550 | 1 |
PageRank | References | Authors |
0.37 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuexin Wu | 1 | 99 | 5.78 |
Xiujun Li | 2 | 139 | 11.73 |
Jingjing Liu | 3 | 515 | 39.31 |
Jianfeng Gao | 4 | 5729 | 296.43 |
Yiming Yang | 5 | 5390 | 500.59 |