Counterfactual Multi-Agent Policy Gradients

2017-05-24Code Available1· sign in to hype

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

Code Available — Be the first to reproduce this paper.

Code

github.com/puyuan1996/MARL
pytorch★ 41
github.com/nice-hku/cl2marl-smac
pytorch★ 14
github.com/hanhanAnderson/LSF-SAC
pytorch★ 5
github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py
pytorch★ 0
github.com/matteokarldonati/Counterfactual-Multi-Agent-Policy-Gradients
pytorch★ 0
github.com/TonghanWang/NDQ
pytorch★ 0
github.com/gingkg/smac
pytorch★ 0

Abstract

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Tasks

Autonomous Vehicles counterfactual Reinforcement Learning SMAC+Starcraft

Counterfactual Multi-Agent Policy Gradients

Code

Abstract

Tasks

Reproductions