Title
Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management
Abstract
This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e.g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).
Year
DOI
Venue
2013
10.1109/ASRU.2013.6707714
Automatic Speech Recognition and Understanding
Keywords
Field
DocType
expert systems,interactive systems,learning (artificial intelligence),optimisation,HIS,expert knowledge,expert-based reward shaping,exploration scheme,goal-oriented dialogue management,hidden information state,learning agent,policy learning,policy optimization,reinforcement learning,temporal difference-based learning,dialogue management,reinforcement learning,reward shaping,value function approximation
Dialogue management,Temporal difference learning,Computer science,Expert system,Artificial intelligence,Boosting (machine learning),Dilemma,Machine learning,Learning classifier system,Reinforcement learning,Legal expert system
Conference
Citations 
PageRank 
References 
4
0.43
10
Authors
2
Name
Order
Citations
PageRank
Emmanuel Ferreira1374.23
Fabrice Lefèvre218526.62