On Agnostic PAC Learning in the Small Error Regime
Julian Asilis, Mikael Møller Høgsgaard, Grigoris Velegkas
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Binary classification in the classic PAC model exhibits a curious phenomenon: Empirical Risk Minimization (ERM) learners are suboptimal in the realizable case yet optimal in the agnostic case. Roughly speaking, this owes itself to the fact that non-realizable distributions D are simply more difficult to learn than realizable distributions -- even when one discounts a learner's error by err(h^*_D), the error of the best hypothesis in H for D. Thus, optimal agnostic learners are permitted to incur excess error on (easier-to-learn) distributions D for which = err(h^*_D) is small. Recent work of Hanneke, Larsen, and Zhivotovskiy (FOCS `24) addresses this shortcoming by including itself as a parameter in the agnostic error term. In this more fine-grained model, they demonstrate tightness of the error lower bound + ( (d + (1 / ))m + d + (1 / )m ) in a regime where > d/m, and leave open the question of whether there may be a higher lower bound when d/m, with d denoting VC(H). In this work, we resolve this question by exhibiting a learner which achieves error c + O ( (d + (1 / ))m + d + (1 / )m ) for a constant c 2.1, thus matching the lower bound when d/m. Further, our learner is computationally efficient and is based upon careful aggregations of ERM classifiers, making progress on two other questions of Hanneke, Larsen, and Zhivotovskiy (FOCS `24). We leave open the interesting question of whether our approach can be refined to lower the constant from 2.1 to 1, which would completely settle the complexity of agnostic learning.