Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Emile Anand, Ishani Karmarkar
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with a global agent and n homogeneous local agents in a communication-constrained regime, where the global agent only observes a subset of k local agent states per time step. We propose an alternating learning framework (ALTERNATING-MARL), where the global agent performs subsampled mean-field Q-learning against a fixed local policy, and local agents update by optimizing in an induced MDP. We prove that these approximate best-response dynamics converge to an O(1/k)-approximate Nash Equilibrium, while yielding a separation in the sample complexities between the joint state space and action space. Finally, we validate our results in numerical simulations for multi-robot control and federated optimization.