Difference Advantage Estimation for Multi-Agent Policy Gradients. - Citegraph

Paper Info

Title
Difference Advantage Estimation for Multi-Agent Policy Gradients.

Abstract
Multi-agent policy gradient methods in centralized training with decentralized execution recently witnessed many progresses. During centralized training, multi-agent credit assignment is crucial, which can substantially promote learning performance. However, explicit multi-agent credit assignment in multi-agent policy gradient methods still receives less attention. In this paper, we investigate multi-agent credit assignment induced by reward shaping and provide a theoretical understanding in terms of its credit assignment and policy bias. Based on this, we propose an exponentially weighted advantage estimator, which is analogous to GAE, to enable multi-agent credit assignment while allowing the tradeoff with policy bias. Empirical results show that our approach can successfully perform effective multi-agent credit assignment, and thus substantially outperforms other advantage estimators.

Year	Venue	DocType
2022	International Conference on Machine Learning	Conference
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yueheng Li	1	0	1.01
Guangming Xie	2	1276	96.56
Zongqing Lu	3	209	26.18

1