Title
SEM: Safe exploration mask for q-learning
Abstract
Most reinforcement learning algorithms focus on discovering the optimal policy to maximize reward while neglecting the safety issue during the exploration stage, which is not acceptable in industrial applications. This paper concerns the efficient method to improve the safety of the agent during the exploration stage in q-learning without any prior knowledge. We propose a novel approach named safe exploration mask to reduce the number of safety violations in q-learning by modifying the transition possibility of the environment. To this end, a safety indicator function consisting of distance metric and controllability metric is designed. The safety indicator function can be learned by the agent through bootstrapping without additional optimization solver. We prove that the safety indicator function will converge in tabular q-learning and introduce two tricks to mitigate the divergence in approximation-based q-learning. Based on the safety indicator function, the safe exploration mask is generated to modify the original exploration policy by reducing the transition possibility of unsafe actions. Finally, the simulations in both discrete and continuous environments demonstrate the advantages, feasibility, and safety of our method in both discrete and continuous q-learning algorithms.
Year
DOI
Venue
2022
10.1016/j.engappai.2022.104765
Engineering Applications of Artificial Intelligence
Keywords
DocType
Volume
Reinforcement learning,Safe exploration,Fuzzy Q-learning,Safe reinforcement learning
Journal
111
ISSN
Citations 
PageRank 
0952-1976
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Chengbin Xuan100.34
Feng Zhang200.34
H. K. Lam33618193.15