Beyond Worst-Case Dimensionality Reduction for Sparse Vectors
Sandeep Silwal, David P. Woodruff, Qiuyi Zhang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We study beyond worst-case dimensionality reduction for s-sparse vectors. Our work is divided into two parts, each focusing on a different facet of beyond worst-case analysis: We first consider average-case guarantees. A folklore upper bound based on the birthday-paradox states: For any collection X of s-sparse vectors in R^d, there exists a linear map to R^O(s^2) which exactly preserves the norm of 99\% of the vectors in X in any _p norm (as opposed to the usual setting where guarantees hold for all vectors). We give lower bounds showing that this is indeed optimal in many settings: any oblivious linear map satisfying similar average-case guarantees must map to (s^2) dimensions. The same lower bound also holds for a wide class of smooth maps, including `encoder-decoder schemes', where we compare the norm of the original vector to that of a smooth function of the embedding. These lower bounds reveal a separation result, as an upper bound of O(s (d)) is possible if we instead use arbitrary (possibly non-smooth) functions, e.g., via compressed sensing algorithms. Given these lower bounds, we specialize to sparse non-negative vectors. For a dataset X of non-negative s-sparse vectors and any p 1, we can non-linearly embed X to O(s(|X|s)/^2) dimensions while preserving all pairwise distances in _p norm up to 1 , with no dependence on p. Surprisingly, the non-negativity assumption enables much smaller embeddings than arbitrary sparse vectors, where the best known bounds suffer exponential dependence. Our map also guarantees exact dimensionality reduction for _ by embedding into O(s |X|) dimensions, which is tight. We show that both the non-linearity of f and the non-negativity of X are necessary, and provide downstream algorithmic improvements.