Title
Thompson Sampling For Some Decentralized Control Problems
Abstract
We consider a two-agent team learning problem over an infinite time horizon under two different dynamics and information sharing models: i) Decoupled dynamics with no information sharing, ii) Coupled dynamics with one-step delayed information sharing. The state transition kernels are parametrized by an unknown but fixed parameter taking values in a finite space. We study a decentralized Thompson sampling based approach to learn the underlying parameter where each agent maintains a belief about the underlying parameter. The agents draw a sample from their beliefs at each time and select their action using the benchmark policy for the sampled parameter. We show that under some assumptions on the state transition kernels, the regret achieved by Thompson sampling is upper bounded by a constant independent of the time horizon.
Year
DOI
Venue
2018
10.1109/CDC.2018.8619423
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC)
Field
DocType
ISSN
Team learning,Mathematical optimization,Time horizon,Decentralised system,Parametrization,Regret,Computer science,Thompson sampling,Information sharing,Bounded function
Conference
0743-1546
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Mukul Gagrani1164.52
Ashutosh Nayyar224030.84