Title
Learning to Build High-Fidelity and Robust Environment Models
Abstract
This paper is concerned with robust learning to simulate (RL2S), a new problem of reinforcement learning (RL) that focuses on learning a high-fidelity environment model (i.e., simulator) for serving diverse downstream tasks. Different from the environment learning in model-based RL, where the learned dynamics model is only appropriate to provide simulated data for the specific policy, the goal of RL2S is to build a simulator that is of high fidelity when interacting with various policies. Thus the robustness (i.e., the ability to provide accurate simulations to various policies) of the simulator over diverse corner cases (policies) is the key challenge to address. Via formulating the policy-environment as a dual Markov decision process, we transform RL2S as a novel robust imitation learning problem and propose efficient algorithms to solve it. Experiments on continuous control scenarios demonstrate that the RL2S enabled methods outperform the others on learning high-fidelity simulators for evaluating, ranking and training various policies.
Year
DOI
Venue
2021
10.1007/978-3-030-86486-6_7
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES
Keywords
DocType
Volume
Simulator, Imitation learning, Robust learning
Conference
12975
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Weinan Zhang1122897.24
Zhengyu Yang2678.51
Jian Shen3225.46
Minghuan Liu400.68
Yimin Huang573.61
Xing Zhang615532.89
Ruiming Tang7397.21
Zhenguo Li858141.17