Efficient active learning of sparse halfspaces with arbitrary bounded noise
Chicheng Zhang, Jie Shen, Pranjal Awasthi
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study active learning of homogeneous s-sparse halfspaces in R^d under the setting where the unlabeled data distribution is isotropic log-concave and each label is flipped with probability at most for a parameter [0, 12), known as the bounded noise. Even in the presence of mild label noise, i.e. is a small constant, this is a challenging problem and only recently have label complexity bounds of the form O(s polylog(d, 1)) been established in [Zhang, 2018] for computationally efficient algorithms. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse: the best known result of [Awasthi et al., 2016] provides a computationally efficient algorithm with label complexity O((s d)^2^poly(1/(1-2)) ), which is label-efficient only when the noise rate is a fixed constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of s-sparse halfspaces, with a label complexity of O(s(1-2)^4 polylog (d, 1 ) ). This is the first efficient algorithm with label complexity polynomial in 11-2 in this setting, which is label-efficient even for arbitrarily close to 12. Our active learning algorithm and its theoretical guarantees also immediately translate to new state-of-the-art label and sample complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise. The key insight of our algorithm and analysis is a new interpretation of online learning regret inequalities, which may be of independent interest.