Efficient Policy Detecting And Reusing For Non-Stationarity In Markov Games - Citegraph

Paper Info

Title
Efficient Policy Detecting And Reusing For Non-Stationarity In Markov Games

Abstract
One challenging problem in multiagent systems is to cooperate or compete with non-stationary agents that change behavior from time to time. An agent in such a non-stationary environment is usually supposed to be able to quickly detect the other agents' policy during online interaction, and then adapt its own policy accordingly. This article studies efficient policy detecting and reusing techniques when playing against non-stationary agents in cooperative or competitive Markov games. We propose a new deep Bayesian policy reuse algorithm, a.k.a. DPN-BPR+, by extending the recent BPR+ algorithm with a neural network as the value-function approximator. To detect policy accurately, we propose the rectified belief model taking advantage of the opponent model to infer the other agents' policy from reward signals and its behavior. Instead of directly storing individual policies as BPR+, we introduce distilled policy network that serves as the policy library, and policy distillation to achieve efficient online policy learning and reuse. DPN-BPR+ inherits all the advantages of BPR+. In experiments, we evaluate DPN-BPR+ in terms of detection accuracy, cumulative reward and speed of convergence in four complex Markov games with raw visual inputs, including two cooperative games and two competitive games. Empirical results show that our proposed DPN-BPR+ approach has better performance than existing algorithms in all these Markov games.

Year	DOI	Venue
2021	10.1007/s10458-020-09480-9	AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS
Keywords	DocType	Volume
Non-stationary agents, Deep reinforcement learning, Opponent modeling, Bayesian policy reuse	Journal	35
Issue	ISSN	Citations
1	1387-2532	1
PageRank	References	Authors
0.40	35	7

Authors (7 rows)

Cited by (1 rows)

References (35 rows)

Name	Order	Citations	PageRank
ZHENG, YAN	1	14	4.22
Jianye Hao	2	189	55.78
Zongzhang Zhang	3	36	10.71
Zhaopeng Meng	4	79	15.68
Tianpei Yang	5	13	6.43
Yanran Li	6	1	1.07
Changjie Fan	7	57	21.37

1