Byzantine Stochastic Gradient Descent

2018-03-23NeurIPS 2018Unverified0· sign in to hype

Dan Alistarh, Zeyuan Allen-Zhu, Jerry Li

Unverified — Be the first to reproduce this paper.

Abstract

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the m machines which allegedly compute stochastic gradients every iteration, an -fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds -approximate minimizers of convex functions in T = O( 1^2 m + ^2^2 ) iterations. In contrast, traditional mini-batch SGD needs T = O( 1^2 m ) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.

Tasks

Stochastic Optimization

Byzantine Stochastic Gradient Descent

Abstract

Tasks

Reproductions