SOTAVerified

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

2019-02-01Code Available0· sign in to hype

Anastasia Koloskova, Sebastian U. Stich, Martin Jaggi

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over n machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by 1 (=1 meaning no compression). We (i) propose a novel gossip-based stochastic gradient descent algorithm, CHOCO-SGD, that converges at rate O(1/(nT) + 1/(T ^2 )^2) for strongly convex objectives, where T denotes the number of iterations and the eigengap of the connectivity matrix. Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, O(1/(nT)), is the same as for the centralized baseline with exact communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the average consensus problem that converges in time O(1/(^2) (1/)) for accuracy > 0. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for > 0 and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce communication by at least two orders of magnitudes.

Tasks

Reproductions