Negative Update Intervals in Deep Multi-Agent Reinforcement Learning - Citegraph

Paper Info

Title
Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Abstract
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Qlearning [15] addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously [29]. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game [3], within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies.

Year	DOI	Venue
2018	10.5555/3306127.3331672	adaptive agents and multi-agents systems
Keywords	Field	DocType
Deep Multi-Agent Reinforcement Learning	Small number,Multiple pathologies,Overgeneralisation,Computer science,Markov chain,Artificial intelligence,Probabilistic logic,Machine learning,Management science,Reinforcement learning	Journal
Volume	Citations	PageRank
abs/1809.05096	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Gregory Palmer	1	6	1.10
Rahul Savani	2	243	30.09
Karl Tuyls	3	1272	127.83

1