Title
A Multi-Agent Off-Policy Actor-Critic Algorithm For Distributed Reinforcement Learning
Abstract
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given. Copyright (C) 2020 The Authors.
Year
DOI
Venue
2019
10.1016/j.ifacol.2020.12.2021
IFAC PAPERSONLINE
Keywords
DocType
Volume
consensus and reinforcement learning control, adaptive control of multi-agent systems
Journal
53
Issue
ISSN
Citations 
2
2405-8963
0
PageRank 
References 
Authors
0.34
13
6
Name
Order
Citations
PageRank
Wesley Suttle100.34
zhuoran yang25229.86
Kaiqing Zhang34813.02
Zhaoran Wang415733.20
Tamer Basar53497402.11
Ji Liu614626.61