Escaping Saddle Points in Distributed Newton's Method with Communication Efficiency and Byzantine Resilience
Avishek Ghosh, Raj Kumar Maity, Arya Mazumdar, Kannan Ramchandran
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
The problem of saddle-point avoidance for non-convex optimization is quite challenging in large scale distributed learning frameworks, such as Federated Learning, especially in the presence of Byzantine workers. The celebrated cubic-regularized Newton method of nest is one of the most elegant ways to avoid saddle-points in the standard centralized (non-distributed) setup. In this paper, we extend the cubic-regularized Newton method to a distributed framework and simultaneously address several practical challenges like communication bottleneck and Byzantine attacks. Note that the issue of saddle-point avoidance becomes more crucial in the presence of Byzantine machines since rogue machines may create fake local minima near the saddle-points of the loss function, also known as the saddle-point attack. Being a second order algorithm, our iteration complexity is much lower than the first order counterparts. Furthermore we use compression (or sparsification) techniques like -approximate compression for communication efficiency. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks, and obtain an improvement of 25\% with respect to first order methods in iteration complexity.