Thompson Sampling For Some Decentralized Control Problems - Citegraph

Paper Info

Title
Thompson Sampling For Some Decentralized Control Problems

Abstract
We consider a two-agent team learning problem over an infinite time horizon under two different dynamics and information sharing models: i) Decoupled dynamics with no information sharing, ii) Coupled dynamics with one-step delayed information sharing. The state transition kernels are parametrized by an unknown but fixed parameter taking values in a finite space. We study a decentralized Thompson sampling based approach to learn the underlying parameter where each agent maintains a belief about the underlying parameter. The agents draw a sample from their beliefs at each time and select their action using the benchmark policy for the sampled parameter. We show that under some assumptions on the state transition kernels, the regret achieved by Thompson sampling is upper bounded by a constant independent of the time horizon.

Year	DOI	Venue
2018	10.1109/CDC.2018.8619423	2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC)
Field	DocType	ISSN
Team learning,Mathematical optimization,Time horizon,Decentralised system,Parametrization,Regret,Computer science,Thompson sampling,Information sharing,Bounded function	Conference	0743-1546
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mukul Gagrani	1	16	4.52
Ashutosh Nayyar	2	240	30.84

1