SOTAVerified

Error Feedback Shines when Features are Rare

2023-05-24Code Available0· sign in to hype

Peter Richtárik, Elnur Gasanov, Konstantin Burlachenko

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We provide the first proof that gradient descent (green GD) with greedy sparsification (green TopK) and error feedback (green EF) can obtain better communication complexity than vanilla green GD when solving the distributed optimization problem _x R^d f(x)=1n_i=1^n f_i(x), where n = # of clients, d = # of features, and f_1,,f_n are smooth nonconvex functions. Despite intensive research since 2014 when green EF was first proposed by Seide et al., this problem remained open until now. We show that green EF shines in the regime when features are rare, i.e., when each feature is present in the data owned by a small number of clients only. To illustrate our main result, we show that in order to find a random vector x such that f(x) ^2 in expectation, green GD with the green Top1 sparsifier and green EF requires O (( L+bluer redcn ( redcn _i L_i^2, 1n_i=1^n L_i^2 ) ) 1 ) bits to be communicated by each worker to the server only, where L is the smoothness constant of f, L_i is the smoothness constant of f_i, redc is the maximal number of clients owning any feature (1 redc n), and bluer is the maximal number of features owned by any client (1 bluer d). Clearly, the communication complexity improves as redc decreases (i.e., as features become more rare), and can be much better than the O(bluer L 1) communication complexity of green GD in the same regime.

Tasks

Reproductions