Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning - Citegraph

Paper Info

Title
Decentralized optimal large scale multi-player pursuit-evasion strategies: A mean field game approach with reinforcement learning

Abstract
In this paper, the intelligent design for the pursuit-evasion game with large scale multi-pursuer and multi-evader has been investigated. Due to the vast number of agents, the notorious ”Curse of Dimensionality” can seriously challenge the traditional design in multi-player pursuit-evasion game, especially under harsh environment with limited communication resource to support information exchange among multi-players. To address this intractable challenge, the emerging Mean Field Games (MFG) theory has been utilized to solve the optimal pursuit-evasion strategies based on a new form of probability density function (PDF) instead of detailed information from all the other players/agents. As such, not only the information exchange is reduced, but also the computation dimension for the optimal strategy derivation is decreased. Specifically, the MFG has been integrated into the pursuit-evasion game to generate a hierarchical structure where the pursuers and the evaders form two mean field groups separately. To online solve the mean field equations, i.e., two coupled partial differential equations, the actor-critic reinforcement learning mechanism is adopted and further extended to a novel actor-critic-mass-opponent (ACMO) approach. In ACMO, the actor neural network estimates the optimal control, the critic neural network approximates the optimal cost function, the mass neural network learns the agent’s group PDF, and the opponent neural network predicts the opponents’ average states in the form of PDF that causes maximum cost for the agent’s group. The Lyapunov theory is utilized to provide the convergence analysis for all neural networks and the stability analysis for the closed-loop system. Eventually, a series of numerical simulations are conducted to demonstrate the effectiveness of the developed scheme.

Year	DOI	Venue
2022	10.1016/j.neucom.2021.01.141	Neurocomputing
Keywords	DocType	Volume
Approximate dynamic programming,Optimal control,Mean field game theory,Pursuit-evasion game,Reinforcement learning	Journal	484
ISSN	Citations	PageRank
0925-2312	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zejian Zhou	1	2	3.42
Hao Xu	2	0	0.34

1