Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

2017-09-15Unverified0· sign in to hype

Sébastien M. R. Arnold, Chunming Wang

Unverified — Be the first to reproduce this paper.

Abstract

We introduce a novel method to compute a rank m approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.

Tasks

Deep Learning Second-order methods Stochastic Optimization

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

Abstract

Tasks

Reproductions