Abstract | ||
---|---|---|
We consider a two-agent team learning problem over an infinite time horizon under two different dynamics and information sharing models: i) Decoupled dynamics with no information sharing, ii) Coupled dynamics with one-step delayed information sharing. The state transition kernels are parametrized by an unknown but fixed parameter taking values in a finite space. We study a decentralized Thompson sampling based approach to learn the underlying parameter where each agent maintains a belief about the underlying parameter. The agents draw a sample from their beliefs at each time and select their action using the benchmark policy for the sampled parameter. We show that under some assumptions on the state transition kernels, the regret achieved by Thompson sampling is upper bounded by a constant independent of the time horizon. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/CDC.2018.8619423 | 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC) |
Field | DocType | ISSN |
Team learning,Mathematical optimization,Time horizon,Decentralised system,Parametrization,Regret,Computer science,Thompson sampling,Information sharing,Bounded function | Conference | 0743-1546 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mukul Gagrani | 1 | 16 | 4.52 |
Ashutosh Nayyar | 2 | 240 | 30.84 |