Title
Efficient Policy Detecting And Reusing For Non-Stationarity In Markov Games
Abstract
One challenging problem in multiagent systems is to cooperate or compete with non-stationary agents that change behavior from time to time. An agent in such a non-stationary environment is usually supposed to be able to quickly detect the other agents' policy during online interaction, and then adapt its own policy accordingly. This article studies efficient policy detecting and reusing techniques when playing against non-stationary agents in cooperative or competitive Markov games. We propose a new deep Bayesian policy reuse algorithm, a.k.a. DPN-BPR+, by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the rectified belief model taking advantage of the opponent model to infer the other agents' policy from reward signals and its behavior. Instead of directly storing individual policies as BPR+, we introduce distilled policy network that serves as the policy library, and policy distillation to achieve efficient online policy learning and reuse. DPN-BPR+ inherits all the advantages of BPR+. In experiments, we evaluate DPN-BPR+ in terms of detection accuracy, cumulative reward and speed of convergence in four complex Markov games with raw visual inputs, including two cooperative games and two competitive games. Empirical results show that our proposed DPN-BPR+ approach has better performance than existing algorithms in all these Markov games.
Year
DOI
Venue
2021
10.1007/s10458-020-09480-9
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS
Keywords
DocType
Volume
Non-stationary agents, Deep reinforcement learning, Opponent modeling, Bayesian policy reuse
Journal
35
Issue
ISSN
Citations 
1
1387-2532
1
PageRank 
References 
Authors
0.40
35
7
Name
Order
Citations
PageRank
ZHENG, YAN1144.22
Jianye Hao218955.78
Zongzhang Zhang33610.71
Zhaopeng Meng47915.68
Tianpei Yang5136.43
Yanran Li611.07
Changjie Fan75721.37