Active Learning of General Halfspaces: Label Queries vs Membership Queries
Ilias Diakonikolas, Daniel M. Kane, Mingchen Ma
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on R^d in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm is allowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivial improvements over the passive setting. Specifically, we show that any active learner requires label complexity of (d/((m))), where m is the number of unlabeled examples. Specifically, to beat the passive label complexity of O (d/), an active learner requires a pool of 2^poly(d) unlabeled samples. On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Specifically, we give a computationally efficient learner with query complexity of O(\1/p, 1/\ + d polylog(1/)) achieving error guarantee of O(opt)+. Here p [0, 1/2] is the bias and opt is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models. Taken together, our results characterize the complexity of learning general halfspaces under Gaussian marginals in these models.