Q-learning for Optimal Control of Continuous-time Systems. - Citegraph

Paper Info

Title
Q-learning for Optimal Control of Continuous-time Systems.

Abstract
In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for continuous-time systems, policy iteration based QL (PIQL) and value iteration based QL (VIQL) algorithms are proposed for learning the optimal control policy from real system data rather than using mathematical system model. It is proved that both PIQL and VIQL methods generate a nonincreasing Q-function sequence, which converges to the optimal Q-function. For implementation of the QL algorithms, the method of weighted residuals is applied to derived the parameters update rule. The developed PIQL and VIQL algorithms are essentially off-policy reinforcement learning approachs, where the system data can be collected arbitrary and thus the exploration ability is increased. With the data collected from the real system, the QL methods learn the optimal control policy offline, and then the convergent control policy will be employed to real system. The effectiveness of the developed QL algorithms are verified through computer simulation.

Year	Venue	Field
2014	CoRR	Convergence (routing),Mathematical optimization,Nonlinear system,Optimal control,Computer science,Algorithm,Q-learning,Markov decision process,System model,Reinforcement learning
DocType	Volume	Citations
Journal	abs/1410.2954	0
PageRank	References	Authors
0.34	15	3

Authors (3 rows)

Cited by (0 rows)

References (15 rows)

Name	Order	Citations	PageRank
Biao Luo	1	554	23.80
Derong Liu	2	5457	286.88
Tingwen Huang	3	5684	310.24

1