Optimal Sketching Bounds for Sparse Linear Regression
Tung Mai, Alexander Munteanu, Cameron Musco, Anup B. Rao, Chris Schwiegelshohn, David P. Woodruff
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study oblivious sketching for k-sparse linear regression under various loss functions such as an _p norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse _2 norm regression, there is a distribution over oblivious sketches with (k(d/k)/^2) rows, which is tight up to a constant factor. This extends to _p loss with an additional additive O(k(k/)/^2) term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the _2 norm, we observe an upper bound of O(k (d)/ + k(k/)/^2) rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve o(d) rows showing that O(^2 k( n d/)/^2) rows suffice, where is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on . Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize \|Ax-b\|_2^2+\|x\|_1 over xR^d. We show that sketching dimension O((d)/( )^2) suffices and that the dependence on d and is tight.