_p-Regression in the Arbitrary Partition Model of Communication
Yi Li, Honghao Lin, David P. Woodruff
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We consider the randomized communication complexity of the distributed _p-regression problem in the coordinator model, for p (0,2]. In this problem, there is a coordinator and s servers. The i-th server receives A^i\-M, -M+1, , M\^n d and b^i\-M, -M+1, , M\^n and the coordinator would like to find a (1+)-approximate solution to _xR^n \|(_i A^i)x - (_i b^i)\|_p. Here M poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For p = 2, i.e., least squares regression, we give the first optimal bound of (sd^2 + sd/) bits. For p (1,2),we obtain an O(sd^2/ + sd/poly()) upper bound. Notably, for d sufficiently large, our leading order term only depends linearly on 1/ rather than quadratically. We also show communication lower bounds of (sd^2 + sd/^2) for p (0,1] and (sd^2 + sd/) for p (1,2]. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).