The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
2007-12-01NeurIPS 2007Unverified0· sign in to hype
John Langford, Tong Zhang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information. Epoch-Greedy has the following properties: No knowledge of a time horizon T is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. The regret scales as O(T^2/3 S^1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.