Analysis of Quantized Models

2019-05-01ICLR 2019Unverified0· sign in to hype

Lu Hou, Ruiliang Zhang, James T. Kwok

Unverified — Be the first to reproduce this paper.

Abstract

Weight-quantized networks have small storage and fast inference, but training can still be time-consuming. This can be improved with distributed learning. To reduce the high communication cost due to worker-server synchronization, recently gradient quantization has also been proposed to train networks with full-precision weights. In this paper, we theoretically study how the combination of both weight and gradient quantization affects convergence. We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization. Empirical experiments confirm the theoretical convergence results, and demonstrate that quantized networks can speed up training and have comparable performance as full-precision networks.

Tasks

Quantization

Analysis of Quantized Models

Abstract

Tasks

Reproductions