SOTAVerified

Convert, compress, correct: Three steps toward communication-efficient DNN training

2022-03-17Code Available0· sign in to hype

Zhong-Jing Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper, we introduce a novel algorithm, CO_3, for communication-efficiency distributed Deep Neural Network (DNN) training. CO_3 is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.

Tasks

Reproductions