Convert, compress, correct: Three steps toward communication-efficient DNN training
Zhong-Jing Chen, Eduin E. Hernandez, Yu-Chih Huang, Stefano Rini
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/chen-zhong-jing/co3_algorithmOfficialIn papertf★ 2
Abstract
In this paper, we introduce a novel algorithm, CO_3, for communication-efficiency distributed Deep Neural Network (DNN) training. CO_3 is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.