Beyond backpropagate through time: Efficient model-based training through time-splitting - Citegraph

Paper Info

Title
Beyond backpropagate through time: Efficient model-based training through time-splitting

Abstract
Model-based policy gradient (MBPG) has been employed to seek an approximate solution to the optimal control problem. However, there is coupling between adjacent states due to temporal dependencies, making the training time grow linearly with the time horizon. This paper reshapes the training process of MBPG with the time-splitting technique to establish a time-independent algorithm called Training Through Time-Splitting (T3S). First, copy the coupled variables to obtain two independent variables. Meanwhile, an extra variable together with an equivalence constraint is introduced for problem consistency. Then, the transformed problem divides into subproblems with carefully derived loss functions. Subproblems own decoupled variables and shared policy networks, which means they can be optimized concurrently. Guided by the algorithm design, this paper further proposes an asynchronous parallel training scheme to accelerate training efficiency. Numerical simulation shows that the T3S algorithm outperforms the MBPG algorithm by 83.6% in wall-clock time with a trajectory tracking task.

Year	DOI	Venue
2022	10.1002/int.22928	INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS
Keywords	DocType	Volume
model-based policy gradient, optimal control, parallel training, reinforcement learning, time-splitting	Journal	37
Issue	ISSN	Citations
10	0884-8173	0
PageRank	References	Authors
0.34	0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jiaxin Gao	1	0	0.34
Yang Guan	2	0	0.34
Wenyu Li	3	81	15.96
Shengbo Li	4	535	50.07
Fei Ma	5	2	4.59
Jianfeng Zheng	6	0	0.34
Junqing Wei	7	0	0.34
Bo Zhang	8	0	0.34
Keqiang Li	9	583	52.39

1