Scaling up budgeted reinforcement learning. - Citegraph

Paper Info

Title
Scaling up budgeted reinforcement learning.

Abstract
Can we learn a control policy able to adapt its behaviour in real time so as to take any desired amount of risk? The general Reinforcement Learning framework solely aims at optimising a total reward in expectation, which may not be desirable in critical applications. In stark contrast, the Budgeted Markov Decision Process (BMDP) framework is a formalism in which the notion of risk is implemented as a hard constraint on a failure signal. Existing algorithms solving BMDPs rely on strong assumptions and have so far only been applied to toy-examples. In this work, we relax some of these assumptions and demonstrate the scalability of our approach on two practical problems: a spoken dialogue system and an autonomous driving task. On both examples, we reach similar performances as Lagrangian Relaxation methods with a significant improvement in sample and memory efficiency.

Year	Venue	DocType
2019	arXiv: Learning	Journal
Volume	Citations	PageRank
abs/1903.01004	0	0.34
References	Authors
10	6

Authors (6 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Nicolas Carrara	1	0	0.68
Edouard Leurent	2	0	0.34
Romain Laroche	3	110	17.35
Tanguy Urvoy	4	10	1.33
Odalric-Ambrym Maillard	5	171	26.40
Olivier Pietquin	6	664	68.60

1