Scaling Multi-Armed Bandit Algorithms

2019-07-25KDD 2019Unverified0· sign in to hype

Edouard Fouché, Junpei Komiyama, Klemens Böhm

Unverified — Be the first to reproduce this paper.

Abstract

The Multi-Armed Bandit (MAB) is a fundamental model capturing the dilemma between exploration and exploitation in sequential decision making. At every time step, the decision maker selects a set of arms and observes a reward from each of the chosen arms. In this paper, we present a variant of the problem, which we call the Scaling MAB (S-MAB): The goal of the decision maker is not only to maximize the cumulative rewards, i.e., choosing the arms with the highest expected reward, but also to decide how many arms to select so that, in expectation, the cost of selecting arms does not exceed the rewards. This problem is relevant to many real- world applications, e.g., online advertising, financial investments or data stream monitoring. We propose an extension of Thompson Sampling, which has strong theoretical guarantees and is reported to perform well in practice. Our extension dynamically controls the number of arms to draw. Furthermore, we combine the proposed method with ADWIN, a state-of-the-art change detector, to deal with non-static environments. We illustrate the benefits of our contribution via a real-world use case on predictive maintenance.

Tasks

Multi-Armed Bandits Sequential Decision Making Thompson Sampling

Scaling Multi-Armed Bandit Algorithms

Abstract

Tasks

Reproductions