SOTAVerified

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

2023-06-04Code Available0· sign in to hype

Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In fully cooperative multi-agent reinforcement learning (MARL) settings, environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of other agents. To address the above issues, we proposed a unified framework, called DFAC, for integrating distributional RL with value function factorization methods. This framework generalizes expected value function factorization methods to enable the factorization of return distributions. To validate DFAC, we first demonstrate its ability to factorize the value functions of a simple matrix game with stochastic rewards. Then, we perform experiments on all Super Hard maps of the StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing that DFAC is able to outperform a number of baselines.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
SMAC 26m_vs_30mDMIXAverage Score19.17Unverified
SMAC 26m_vs_30mQMIXAverage Score18.23Unverified
SMAC 26m_vs_30mDPLEXAverage Score18.49Unverified
SMAC 26m_vs_30mDDNAverage Score18.49Unverified
SMAC 26m_vs_30mQPLEXAverage Score18.66Unverified
SMAC 26m_vs_30mVDNAverage Score16.69Unverified
SMAC 27m_vs_30mQPLEXMedian Win Rate78.12Unverified
SMAC 27m_vs_30mDPLEXMedian Win Rate90.62Unverified
SMAC 3s5z_vs_3s6zQPLEXMedian Win Rate84.38Unverified
SMAC 3s5z_vs_3s6zDPLEXMedian Win Rate90.62Unverified
SMAC 3s5z_vs_4s6zDDNAverage Score19.65Unverified
SMAC 3s5z_vs_4s6zDMIXAverage Score18.61Unverified
SMAC 3s5z_vs_4s6zVDNAverage Score17.16Unverified
SMAC 3s5z_vs_4s6zDPLEXAverage Score14.99Unverified
SMAC 3s5z_vs_4s6zQPLEXAverage Score13.6Unverified
SMAC 3s5z_vs_4s6zQMIXAverage Score13.09Unverified
SMAC 6h_vs_8zDPLEXMedian Win Rate43.75Unverified
SMAC 6h_vs_8zQPLEXAverage Score15.95Unverified
SMAC 6h_vs_9zQMIXAverage Score12.37Unverified
SMAC 6h_vs_9zVDNAverage Score13.57Unverified
SMAC 6h_vs_9zQPLEXAverage Score13.86Unverified
SMAC 6h_vs_9zDMIXAverage Score13.73Unverified
SMAC 6h_vs_9zDDNAverage Score16Unverified
SMAC 6h_vs_9zDPLEXAverage Score14.84Unverified
SMAC corridorDPLEXMedian Win Rate81.25Unverified
SMAC corridorQPLEXMedian Win Rate75Unverified
SMAC corridor_2z_vs_24zgVDNAverage Score7.78Unverified
SMAC corridor_2z_vs_24zgDMIXAverage Score7.41Unverified
SMAC corridor_2z_vs_24zgQPLEXAverage Score6.44Unverified
SMAC corridor_2z_vs_24zgQMIXAverage Score4.8Unverified
SMAC corridor_2z_vs_24zgDDNAverage Score11.1Unverified
SMAC corridor_2z_vs_24zgDPLEXAverage Score10.71Unverified
SMAC MMM2DPLEXMedian Win Rate96.88Unverified
SMAC MMM2QPLEXMedian Win Rate96.88Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MDDNAverage Score16.5Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MDMIXAverage Score16.24Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MDPLEXAverage Score15.89Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MQPLEXAverage Score15.52Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MQMIXAverage Score14.4Unverified
SMAC MMM2_7m2M1M_vs_8m4M1MVDNAverage Score13.13Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MVDNAverage Score17.3Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MQMIXAverage Score19.01Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MQPLEXAverage Score19.06Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MDMIXAverage Score19.33Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MDPLEXAverage Score19.4Unverified
SMAC MMM2_7m2M1M_vs_9m3M1MDDNAverage Score19.45Unverified

Reproductions