Abstract | ||
---|---|---|
We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process (MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via a Gaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data. |
Year | Venue | Field |
---|---|---|
2018 | THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | Mathematical optimization,Computer science,Gaussian process |
DocType | Citations | PageRank |
Conference | 1 | 0.35 |
References | Authors | |
10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Akifumi Wachi | 1 | 3 | 2.40 |
Yanan Sui | 2 | 30 | 5.85 |
Yisong Yue | 3 | 1200 | 73.85 |
Masahiro Ono | 4 | 133 | 14.40 |