SOTAVerified

Coordination without communication: optimal regret in two players multi-armed bandits

2020-02-14Unverified0· sign in to hype

Sébastien Bubeck, Thomas Budzinski

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret O(T (T)). We also argue that the extra logarithmic term (T) should be necessary by proving a lower bound for a full information variant of the problem.

Tasks

Reproductions