Gap-Dependent Bounds for Federated Q-learning
Haochen Zhang, Zhong Zheng, Lingzhou Xue
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present the first gap-dependent analysis of regret and communication cost for on-policy federated Q-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to T-type regret bounds and communication cost bounds with a T term scaling with the number of agents M, states S, and actions A, where T is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a T-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on MSA from the T term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when M=1, removing SA from the T term.