Title
Posterior Sampling-based Reinforcement Learning for Control of Unknown Linear Systems
Abstract
We propose a posterior sampling-based learning algorithm for the linear quadratic (LQ) control problem with unknown system parameters. The algorithm is called posterior sampling-based reinforcement learning for LQ regulator (PSRL-LQ) where two stopping criteria determine the lengths of the dynamic episodes in posterior sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of PSRL-LQ accumulated up to time <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is bounded by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\tilde{O}(\sqrt{T})$</tex-math></inline-formula> . Here, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\tilde{O}(\cdot)$</tex-math></inline-formula> hides constants and logarithmic factors. Numerical simulations are provided to illustrate the performance of PSRL-LQ.
Year
DOI
Venue
2020
10.1109/TAC.2019.2950156
IEEE Transactions on Automatic Control
Keywords
DocType
Volume
Heuristic algorithms,Aerospace electronics,Bayes methods,Adaptive control,Reinforcement learning,Optimal control,Perturbation methods
Journal
65
Issue
ISSN
Citations 
8
0018-9286
0
PageRank 
References 
Authors
0.34
4
3
Name
Order
Citations
PageRank
Yi Ouyang14310.16
Mukul Gagrani2164.52
Rahul Jain378471.51