Error Feedback Shines when Features are Rare
Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/burlachenkok/ef21_with_rare_featuresOfficialIn paperpytorch★ 0
Abstract
We provide the first proof that gradient descent (green GD) with greedy sparsification (green TopK) and error feedback (green EF) can obtain better communication complexity than vanilla green GD when solving the distributed optimization problem _x R^d f(x)=1n_i=1^n f_i(x), where n = # of clients, d = # of features, and f_1,,f_n are smooth nonconvex functions. Despite intensive research since 2014 when green EF was first proposed by Seide et al., this problem remained open until now. We show that green EF shines in the regime when features are rare, i.e., when each feature is present in the data owned by a small number of clients only. To illustrate our main result, we show that in order to find a random vector x such that f(x) ^2 in expectation, green GD with the green Top1 sparsifier and green EF requires O (( L+bluer redcn ( redcn _i L_i^2, 1n_i=1^n L_i^2 ) ) 1 ) bits to be communicated by each worker to the server only, where L is the smoothness constant of f, L_i is the smoothness constant of f_i, redc is the maximal number of clients owning any feature (1 redc n), and bluer is the maximal number of features owned by any client (1 bluer d). Clearly, the communication complexity improves as redc decreases (i.e., as features become more rare), and can be much better than the O(bluer L 1) communication complexity of green GD in the same regime.