Revisiting Agnostic PAC Learning
Steve Hanneke, Kasper Green Larsen, Nikita Zhivotovskiy
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
PAC learning, dating back to Valiant'84 and Vapnik and Chervonenkis'64,'74, is a classic model for studying supervised learning. In the agnostic setting, we have access to a hypothesis set H and a training set of labeled samples (x_1,y_1),,(x_n,y_n) X \-1,1\ drawn i.i.d. from an unknown distribution D. The goal is to produce a classifier h : X \-1,1\ that is competitive with the hypothesis h^_D H having the least probability of mispredicting the label y of a new sample (x,y) D. Empirical Risk Minimization (ERM) is a natural learning algorithm, where one simply outputs the hypothesis from H making the fewest mistakes on the training data. This simple algorithm is known to have an optimal error in terms of the VC-dimension of H and the number of samples n. In this work, we revisit agnostic PAC learning and first show that ERM is in fact sub-optimal if we treat the performance of the best hypothesis, denoted :=_D[h^_D(x) y], as a parameter. Concretely we show that ERM, and any other proper learning algorithm, is sub-optimal by a (1/) factor. We then complement this lower bound with the first learning algorithm achieving an optimal error for nearly the full range of . Our algorithm introduces several new ideas that we hope may find further applications in learning theory.