Title | ||
---|---|---|
A Multi-Agent Off-Policy Actor-Critic Algorithm For Distributed Reinforcement Learning |
Abstract | ||
---|---|---|
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given. Copyright (C) 2020 The Authors. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.ifacol.2020.12.2021 | IFAC PAPERSONLINE |
Keywords | DocType | Volume |
consensus and reinforcement learning control, adaptive control of multi-agent systems | Journal | 53 |
Issue | ISSN | Citations |
2 | 2405-8963 | 0 |
PageRank | References | Authors |
0.34 | 13 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wesley Suttle | 1 | 0 | 0.34 |
zhuoran yang | 2 | 52 | 29.86 |
Kaiqing Zhang | 3 | 48 | 13.02 |
Zhaoran Wang | 4 | 157 | 33.20 |
Tamer Basar | 5 | 3497 | 402.11 |
Ji Liu | 6 | 146 | 26.61 |