SEM: Safe exploration mask for q-learning - Citegraph

Paper Info

Title
SEM: Safe exploration mask for q-learning

Abstract
Most reinforcement learning algorithms focus on discovering the optimal policy to maximize reward while neglecting the safety issue during the exploration stage, which is not acceptable in industrial applications. This paper concerns the efficient method to improve the safety of the agent during the exploration stage in q-learning without any prior knowledge. We propose a novel approach named safe exploration mask to reduce the number of safety violations in q-learning by modifying the transition possibility of the environment. To this end, a safety indicator function consisting of distance metric and controllability metric is designed. The safety indicator function can be learned by the agent through bootstrapping without additional optimization solver. We prove that the safety indicator function will converge in tabular q-learning and introduce two tricks to mitigate the divergence in approximation-based q-learning. Based on the safety indicator function, the safe exploration mask is generated to modify the original exploration policy by reducing the transition possibility of unsafe actions. Finally, the simulations in both discrete and continuous environments demonstrate the advantages, feasibility, and safety of our method in both discrete and continuous q-learning algorithms.

Year	DOI	Venue
2022	10.1016/j.engappai.2022.104765	Engineering Applications of Artificial Intelligence
Keywords	DocType	Volume
Reinforcement learning,Safe exploration,Fuzzy Q-learning,Safe reinforcement learning	Journal	111
ISSN	Citations	PageRank
0952-1976	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chengbin Xuan	1	0	0.34
Feng Zhang	2	0	0.34
H. K. Lam	3	3618	193.15

1