Abstract | ||
---|---|---|
Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks. |
Year | Venue | DocType |
---|---|---|
2020 | NIPS 2020 | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
11 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ziyu Wang | 1 | 372 | 23.71 |
Alexander Novikov | 2 | 98 | 7.62 |
Żołna Konrad | 3 | 7 | 4.51 |
Josh S. Merel | 4 | 143 | 11.34 |
Jost Tobias Springenberg | 5 | 1126 | 62.86 |
s reed | 6 | 1750 | 80.25 |
Bobak Shahriari | 7 | 283 | 12.43 |
Noah Siegel | 8 | 5 | 2.48 |
Çaglar Gülçehre | 9 | 3010 | 133.22 |
Nicolas Heess | 10 | 1762 | 94.77 |
Nando De Freitas | 11 | 3284 | 273.68 |