DiffRed: Dimensionality Reduction guided by stable rank
Prarabdh Shukla, Gagan Raj Gupta, Kunal Dutta
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/s3-lab-iit/diffredOfficialIn papernone★ 9
Abstract
In this work, we propose a novel dimensionality reduction technique, DiffRed, which first projects the data matrix, A, along first k_1 principal components and the residual matrix A^* (left after subtracting its k_1-rank approximation) along k_2 Gaussian random vectors. We evaluate M1, the distortion of mean-squared pair-wise distance, and Stress, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that DiffRed achieves a general upper bound of O(1-pk_2) on Stress and O((1-p)k_2*(A^*)) on M1 where p is the fraction of variance explained by the first k_1 principal components and (A^*) is the stable rank of A^*. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that DiffRed achieves near zero M1 and much lower values of Stress as compared to the well-known dimensionality reduction techniques. In particular, DiffRed can map a 6 million dimensional dataset to 10 dimensions with 54% lower Stress than PCA.