Learning to Plan via Deep Optimistic Value Exploration - Citegraph

Paper Info

Title
Learning to Plan via Deep Optimistic Value Exploration

Abstract
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of models and formulate an upper confidence bound (UCB) objective to recover optimistic estimates. Training the policy on ensemble rollouts with the learned value function as the terminal cost allows for projecting long-term interactions into a limited planning horizon, thus enabling deep optimistic exploration. We do not assume a priori knowledge of either the dynamics or reward function. We demonstrate that our approach can accommodate both dense and sparse reward signals, while improving sample complexity on a variety of benchmarking tasks. Keywords: Reinforcement Learning; Deep Exploration; Model-Based; Value Function; UCB

Year	Venue	DocType
2020	L4DC	Conference
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tim Seyde	1	0	1.69
Schwarting, W.	2	43	8.25
Sertac Karaman	3	1190	87.27
Daniela Rus	4	7128	657.33

1