Title
Reconnaissance for Reinforcement Learning with Safety Constraints
Abstract
As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.
Year
DOI
Venue
2021
10.1007/978-3-030-86520-7_35
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II
Keywords
DocType
Volume
Safe reinforcement learning, Constrained MDPs, Safety
Conference
12976
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Shin-ichi Maeda123813.16
Hayato Watahiki200.34
Yi Ouyang3187.08
Shintarou Okada400.34
Masanori Koyama52087.80
Prabhat Nagarajan611.37