Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning - Citegraph

Paper Info

Title
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Abstract
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

Year	Venue	DocType
2021	INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139	Conference
Volume	ISSN	Citations
139	2640-3498	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yue Wu	1	0	0.34
shuangfei zhai	2	99	10.00
Nitish Srivastava	3	5645	318.34
Joshua Susskind	4	194	9.68
Jian Zhang	5	0	0.34
Ruslan Salakhutdinov	6	12190	764.15
Hanlin Goh	7	97	8.27

1