Optimal bounds for _p sensitivity sampling via _2 augmentation
Alexander Munteanu, Simon Omlor
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension d times the total sensitivity S while providing strong (1) guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general O(^-2 Sd) bound for the important problem of _p subspace embeddings to O(^-2 S^2/p) for p[1,2]. Their result was subsumed by an earlier O(^-2 Sd^1-p/2) bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain _p sensitivities. We observe that by augmenting the _p sensitivities by _2 sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear O(^-2( S+d)) = O(^-2d) sampling complexity for all p [1,2]. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for p [1,2] and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an O(^-2 d) sensitivity sampling bound for logistic regression, where is a natural complexity measure for this problem. This improves over the previous O(^-2^2 d) bound of Mai et al. (2021) which was based on Lewis weights subsampling.