Title
Beyond backpropagate through time: Efficient model-based training through time-splitting
Abstract
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.
Year
DOI
Venue
2022
10.1002/int.22928
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Keywords
DocType
Volume
model-based policy gradient, optimal control, parallel training, reinforcement learning, time-splitting
Journal
37
Issue
ISSN
Citations 
10
0884-8173
0
PageRank 
References 
Authors
0.34
0
9
Name
Order
Citations
PageRank
Jiaxin Gao100.34
Yang Guan200.34
Wenyu Li38115.96
Shengbo Li453550.07
Fei Ma524.59
Jianfeng Zheng600.34
Junqing Wei700.34
Bo Zhang800.34
Keqiang Li958352.39