SOTAVerified

Adaptive Clustering and Personalization in Multi-Agent Stochastic Linear Bandits

2021-06-15Unverified0· sign in to hype

Avishek Ghosh, Abishek Sankararaman, Kannan Ramchandran

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We consider the problem of minimizing regret in an N agent heterogeneous stochastic linear bandits framework, where the agents (users) are similar but not all identical. We model user heterogeneity using two popularly used ideas in practice; (i) A clustering framework where users are partitioned into groups with users in the same group being identical to each other, but different across groups, and (ii) a personalization framework where no two users are necessarily identical, but a user's parameters are close to that of the population average. In the clustered users' setup, we propose a novel algorithm, based on successive refinement of cluster identities and regret minimization. We show that, for any agent, the regret scales as O(T/N), if the agent is in a `well separated' cluster, or scales as O(T^12 + /(N)^12 -) if its cluster is not well separated, where is positive and arbitrarily close to 0. Our algorithm is adaptive to the cluster separation, and is parameter free -- it does not need to know the number of clusters, separation and cluster size, yet the regret guarantee adapts to the inherent complexity. In the personalization framework, we introduce a natural algorithm where, the personal bandit instances are initialized with the estimates of the global average model. We show that, an agent i whose parameter deviates from the population average by _i, attains a regret scaling of O(_iT). This demonstrates that if the user representations are close (small _i), the resulting regret is low, and vice-versa. The results are empirically validated and we observe superior performance of our adaptive algorithms over non-adaptive baselines.

Tasks

Reproductions