SOTAVerified

Robust Multi-Agent Multi-Armed Bandits

2020-07-07Unverified0· sign in to hype

Daniel Vial, Sanjay Shakkottai, R. Srikant

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Recent works have shown that agents facing independent instances of a stochastic K-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include n honest and m malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We first show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with (i.e., "block") them. We show that collaboration indeed decreases regret for this algorithm, assuming m is small compared to K but without assumptions on malicious agents' behavior, thus ensuring that our algorithm is robust against any malicious recommendation strategy.

Tasks

Reproductions