A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning
Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/j3soon/dfac-extendedOfficialIn paperpytorch★ 7
Abstract
In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| SMAC 26m_vs_30m | DMIX | Average Score | 19.17 | — | Unverified |
| SMAC 26m_vs_30m | QMIX | Average Score | 18.23 | — | Unverified |
| SMAC 26m_vs_30m | DPLEX | Average Score | 18.49 | — | Unverified |
| SMAC 26m_vs_30m | DDN | Average Score | 18.49 | — | Unverified |
| SMAC 26m_vs_30m | QPLEX | Average Score | 18.66 | — | Unverified |
| SMAC 26m_vs_30m | VDN | Average Score | 16.69 | — | Unverified |
| SMAC 27m_vs_30m | QPLEX | Median Win Rate | 78.12 | — | Unverified |
| SMAC 27m_vs_30m | DPLEX | Median Win Rate | 90.62 | — | Unverified |
| SMAC 3s5z_vs_3s6z | QPLEX | Median Win Rate | 84.38 | — | Unverified |
| SMAC 3s5z_vs_3s6z | DPLEX | Median Win Rate | 90.62 | — | Unverified |
| SMAC 3s5z_vs_4s6z | DDN | Average Score | 19.65 | — | Unverified |
| SMAC 3s5z_vs_4s6z | DMIX | Average Score | 18.61 | — | Unverified |
| SMAC 3s5z_vs_4s6z | VDN | Average Score | 17.16 | — | Unverified |
| SMAC 3s5z_vs_4s6z | DPLEX | Average Score | 14.99 | — | Unverified |
| SMAC 3s5z_vs_4s6z | QPLEX | Average Score | 13.6 | — | Unverified |
| SMAC 3s5z_vs_4s6z | QMIX | Average Score | 13.09 | — | Unverified |
| SMAC 6h_vs_8z | DPLEX | Median Win Rate | 43.75 | — | Unverified |
| SMAC 6h_vs_8z | QPLEX | Average Score | 15.95 | — | Unverified |
| SMAC 6h_vs_9z | QMIX | Average Score | 12.37 | — | Unverified |
| SMAC 6h_vs_9z | VDN | Average Score | 13.57 | — | Unverified |
| SMAC 6h_vs_9z | QPLEX | Average Score | 13.86 | — | Unverified |
| SMAC 6h_vs_9z | DMIX | Average Score | 13.73 | — | Unverified |
| SMAC 6h_vs_9z | DDN | Average Score | 16 | — | Unverified |
| SMAC 6h_vs_9z | DPLEX | Average Score | 14.84 | — | Unverified |
| SMAC corridor | DPLEX | Median Win Rate | 81.25 | — | Unverified |
| SMAC corridor | QPLEX | Median Win Rate | 75 | — | Unverified |
| SMAC corridor_2z_vs_24zg | VDN | Average Score | 7.78 | — | Unverified |
| SMAC corridor_2z_vs_24zg | DMIX | Average Score | 7.41 | — | Unverified |
| SMAC corridor_2z_vs_24zg | QPLEX | Average Score | 6.44 | — | Unverified |
| SMAC corridor_2z_vs_24zg | QMIX | Average Score | 4.8 | — | Unverified |
| SMAC corridor_2z_vs_24zg | DDN | Average Score | 11.1 | — | Unverified |
| SMAC corridor_2z_vs_24zg | DPLEX | Average Score | 10.71 | — | Unverified |
| SMAC MMM2 | DPLEX | Median Win Rate | 96.88 | — | Unverified |
| SMAC MMM2 | QPLEX | Median Win Rate | 96.88 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | DDN | Average Score | 16.5 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | DMIX | Average Score | 16.24 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | DPLEX | Average Score | 15.89 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | QPLEX | Average Score | 15.52 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | QMIX | Average Score | 14.4 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_8m4M1M | VDN | Average Score | 13.13 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | VDN | Average Score | 17.3 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | QMIX | Average Score | 19.01 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | QPLEX | Average Score | 19.06 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | DMIX | Average Score | 19.33 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | DPLEX | Average Score | 19.4 | — | Unverified |
| SMAC MMM2_7m2M1M_vs_9m3M1M | DDN | Average Score | 19.45 | — | Unverified |