Title
Regret Analysis For Learning In A Multi-Agent Linear-Quadratic Control Problem
Abstract
We consider a multi-agent Linear-Quadratic (LQ) reinforcement learning problem consisting of three systems, an unknown system and two known systems. In this problem, there are three agents - the actions of agent 1 can affect the unknown system as well as the two known systems while the actions of agents 2 and 3 can only affect their respective co-located known systems. Further, the unknown system's state can affect the known systems' state evolution. In this paper, we are interested in minimizing the infinite-horizon average cost. We propose a Thompson Sampling (TS)-based multi-agent learning algorithm where each agent learns the unknown system's dynamics independently. Our result indicates that the expected regret of our algorithm is upper bounded by (O) over tilde (root T) under certain assumptions, where (O) over tilde (center dot) hides constants and logarithmic factors. Numerical simulations are provided to illustrate the performance of our proposed algorithm.
Year
DOI
Venue
2020
10.23919/ACC45564.2020.9147525
2020 AMERICAN CONTROL CONFERENCE (ACC)
DocType
ISSN
Citations 
Conference
0743-1619
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Seyed Mohammad Asghari173.55
Mukul Gagrani2164.52
Ashutosh Nayyar324030.84