Abstract | ||
---|---|---|
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Qlearning [15] addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously [29]. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game [3], within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies. |
Year | DOI | Venue |
---|---|---|
2018 | 10.5555/3306127.3331672 | adaptive agents and multi-agents systems |
Keywords | Field | DocType |
Deep Multi-Agent Reinforcement Learning | Small number,Multiple pathologies,Overgeneralisation,Computer science,Markov chain,Artificial intelligence,Probabilistic logic,Machine learning,Management science,Reinforcement learning | Journal |
Volume | Citations | PageRank |
abs/1809.05096 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gregory Palmer | 1 | 6 | 1.10 |
Rahul Savani | 2 | 243 | 30.09 |
Karl Tuyls | 3 | 1272 | 127.83 |