Optimal Estimator for Linear Regression with Shuffled Labels
Hang Zhang, Ping Li
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper considers the task of linear regression with shuffled labels, i.e., Y = X B + W, where Y R^n m, Pi R^n n, X R^n p, B R^p m, and W R^n m, respectively, represent the sensing results, (unknown or missing) corresponding information, sensing matrix, signal of interest, and additive sensing noise. Given the observation Y and sensing matrix X, we propose a one-step estimator to reconstruct ( , B). From the computational perspective, our estimator's complexity is O(n^3 + np^2m), which is no greater than the maximum complexity of a linear assignment algorithm (e.g., O(n^3)) and a least square algorithm (e.g., O(np^2 m)). From the statistical perspective, we divide the minimum snr requirement into four regimes, e.g., unknown, hard, medium, and easy regimes; and present sufficient conditions for the correct permutation recovery under each regime: (i) snr (1) in the easy regime; (ii) snr ( n) in the medium regime; and (iii) snr (( n)^c_0 n^c_1/srank( B)) in the hard regime (c_0, c_1 are some positive constants and srank( B) denotes the stable rank of B). In the end, we also provide numerical experiments to confirm the above claims.