SOTAVerified

Thompson sampling for linear quadratic mean-field teams

2020-11-09Unverified0· sign in to hype

Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of |M| different types at time horizon T is O ( |M|^1.5 T ) irrespective of the total number of agents, where the O notation hides logarithmic factors in T. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm.

Tasks

Reproductions