Title | ||
---|---|---|
Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds. |
Abstract | ||
---|---|---|
Perkinsu0027 Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agentu0027s actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation. |
Year | DOI | Venue |
---|---|---|
2016 | 10.5555/2936924.2937002 | AAMAS |
Field | DocType | Citations |
Mathematical optimization,Observability,Temporal difference learning,Monte Carlo method,Observable,Probably approximately correct learning,Computer science,Markov decision process,Artificial intelligence,Local search (optimization),Machine learning,Reinforcement learning | Conference | 2 |
PageRank | References | Authors |
0.37 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Roi Ceren | 1 | 5 | 2.23 |
Prashant Doshi | 2 | 926 | 90.23 |
Bikramjit Banerjee | 3 | 284 | 32.63 |