SOTAVerified

DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

2021-02-16Code Available1· sign in to hype

Wei-Fang Sun, Cheng-Kuang Lee, Chun-Yi Lee

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. To address the above issues, we integrate distributional RL and value function factorization methods by proposing a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods to their DFAC variants. DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture. To validate DFAC, we demonstrate DFAC's ability to factorize a simple two-step matrix game with stochastic rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge, showing that DFAC is able to outperform expected value function factorization baselines.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
SMAC 27m_vs_30mVDNMedian Win Rate63.12Unverified
SMAC 27m_vs_30mDMIXMedian Win Rate85.45Unverified
SMAC 27m_vs_30mDDNMedian Win Rate91.48Unverified
SMAC 27m_vs_30mIQLMedian Win Rate2.27Unverified
SMAC 27m_vs_30mDIQLMedian Win Rate6.02Unverified
SMAC 3s5z_vs_3s6zDDNMedian Win Rate94.03Unverified
SMAC 3s5z_vs_3s6zDMIXMedian Win Rate91.08Unverified
SMAC 3s5z_vs_3s6zDIQLMedian Win Rate62.22Unverified
SMAC 6h_vs_8zDMIXMedian Win Rate49.43Unverified
SMAC 6h_vs_8zDIQLMedian Win Rate0Unverified
SMAC 6h_vs_8zDDNMedian Win Rate83.92Unverified
SMAC corridorDMIXMedian Win Rate90.45Unverified
SMAC corridorVDNMedian Win Rate85.34Unverified
SMAC corridorDIQLMedian Win Rate91.62Unverified
SMAC corridorDDNMedian Win Rate95.4Unverified
SMAC MMM2DDNMedian Win Rate97.22Unverified
SMAC MMM2DIQLMedian Win Rate85.23Unverified
SMAC MMM2DMIXMedian Win Rate95.11Unverified

Reproductions