Abstract | ||
---|---|---|
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts. |
Year | Venue | DocType |
---|---|---|
2021 | INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | Conference |
Volume | ISSN | Citations |
139 | 2640-3498 | 0 |
PageRank | References | Authors |
0.34 | 0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yue Wu | 1 | 0 | 0.34 |
shuangfei zhai | 2 | 99 | 10.00 |
Nitish Srivastava | 3 | 5645 | 318.34 |
Joshua Susskind | 4 | 194 | 9.68 |
Jian Zhang | 5 | 0 | 0.34 |
Ruslan Salakhutdinov | 6 | 12190 | 764.15 |
Hanlin Goh | 7 | 97 | 8.27 |