Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

2020-02-10Unverified0· sign in to hype

Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu

Unverified — Be the first to reproduce this paper.

Abstract

Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach, and shows that the approximation error is of O(1N). By establishing an appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents N, which provides an O(1N) approximation to the MARL problem with N agents in the learning environment. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when N is large, for instance when N>50.

Tasks

Multi-agent Reinforcement Learning Q-Learning Reinforcement Learning

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Abstract

Tasks

Reproductions