Title | ||
---|---|---|
Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty |
Abstract | ||
---|---|---|
Dec-POMDPs are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Recently, reinforcement learning (RL) based approaches have been proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model. These methods assume that agents have only local information available to them during the learning process, i.e. that conditions during learning and policy execution are identical. However, in practical scenarios this may not be the case, and agents may have difficulty learning under such unnecessary constraints. We propose a novel RL approach in which agents are allowed to \\emph{rehearse} with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on this information. We show experimentally that incorporating such information can ameliorate the difficulties faced by non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also propose a new benchmark domain that is less abstract than existing domains and is designed to be particularly challenging to RL-based solvers, as a target for current and future research on RL solutions to Dec-POMDPs. |
Year | DOI | Venue |
---|---|---|
2013 | 10.5555/2484920.2485188 | AAMAS |
Keywords | Field | DocType |
policy execution,full knowledge,decentralized planning,reinforcement learning,rl solution,full prior knowledge,new benchmark domain,prevalent dec-pomdp solution technique,novel rl approach,existing benchmark dec-pomdp problem,concurrent reinforcement,local information | Decentralized planning,Computer science,Artificial intelligence,Error-driven learning,Difficulty learning,Machine learning,Computation,Reinforcement learning | Conference |
Citations | PageRank | References |
1 | 0.37 | 2 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Landon Kraemer | 1 | 89 | 10.03 |
Bikramjit Banerjee | 2 | 284 | 32.63 |