Title
Corruption-Robust Offline Reinforcement Learning
Abstract
We study the adversarial robustness in offline reinforcement learning. Given a batch dataset consisting of tuples (s, a, r, s'), an adversary is allowed to arbitrarily modify epsilon fraction of the tuples. From the corrupted dataset the learner aims to robustly identify a near-optimal policy. We first show that a worst-case Omega(Hd epsilon) optimality gap is unavoidable in linear MDP of dimension d, even if the adversary only corrupts the reward element in a tuple. This contrasts with dimension-free results in robust supervised learning and best-known lower-bound in the online RL setting with corruption. Next, we propose robust variants of the Least-Square Value Iteration (LSVI) algorithm utilizing robust supervised learning oracles, which achieve near-matching performances in cases both with and without global data coverage. The algorithm requires the knowledge of epsilon to design the pessimism bonus in the no-coverage case. Surprisingly, the knowledge of epsilon is necessary, as we show that being adaptive to unknown epsilon. This again contrasts with recent results on corruption-robust online RL and implies that corruption-robust offline RL is a strictly harder problem.
Year
Venue
DocType
2022
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151
Conference
Volume
ISSN
Citations 
151
2640-3498
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Xuezhou Zhang114.41
Yiding Chen200.34
Xiaojin Zhu33586222.74
Wen Sun42810.46