Reconnaissance for Reinforcement Learning with Safety Constraints - Citegraph

Paper Info

Title
Reconnaissance for Reinforcement Learning with Safety Constraints

Abstract
As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.

Year	DOI	Venue
2021	10.1007/978-3-030-86520-7_35	MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II
Keywords	DocType	Volume
Safe reinforcement learning, Constrained MDPs, Safety	Conference	12976
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shin-ichi Maeda	1	238	13.16
Hayato Watahiki	2	0	0.34
Yi Ouyang	3	18	7.08
Shintarou Okada	4	0	0.34
Masanori Koyama	5	208	7.80
Prabhat Nagarajan	6	1	1.37

1