Abstract | ||
---|---|---|
Offline reinforcement learning (RL) aims to train an agent solely using a dataset of historical interactions with the environments without any further costly or dangerous active exploration. Model-based RL (MbRL) usually achieves promising performance in offline RL due to its high sample-efficiency and compact modeling of a dynamic environment. However, it may suffer from the bias and error accumulation of the model predictions. Existing methods address this problem by adding a penalty term to the model reward but require careful hand-tuning of the penalty and its weight. Instead in this paper, we formulate the model-based offline RL as a bi-objective optimization where the first objective aims to maximize the model return and the second objective is adaptive to the learning dynamics of the RL policy. Thereby, we do not need to tune the penalty and its weight but can achieve a more advantageous trade-off between the final model return and model’s uncertainty. We develop an efficient and adaptive policy optimization algorithm equipped with evolution strategy to solve the bi-objective optimization, named as BiES. The experimental results on a D4RL benchmark show that our approach sets the new state of the art and significantly outperforms existing offline RL methods on long-horizon tasks. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-030-97546-3_46 | AI 2021: Advances in Artificial Intelligence |
Keywords | DocType | Volume |
Offline reinforcement learning, Multi-objective optimization, Evolution strategy | Conference | 13151 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
3 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yijun Yang | 1 | 0 | 1.01 |
Jing Jiang | 2 | 130 | 19.52 |
Zhuowei Wang | 3 | 0 | 0.34 |
Qiqi Duan | 4 | 6 | 3.13 |
Yuhui Shi | 5 | 4397 | 435.39 |