Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks

2018-06-01CVPR 2018Unverified0· sign in to hype

Aojun Zhou, Anbang Yao, Kuan Wang, Yurong Chen

Unverified — Be the first to reproduce this paper.

Abstract

Benefiting from tens of millions of hierarchically stacked learnable parameters, Deep Neural Networks (DNNs) have demonstrated overwhelming accuracy on a variety of artificial intelligence tasks. However reversely, the large size of DNN models lays a heavy burden on storage, computation and power consumption, which prohibits their deployments on the embedded and mobile systems. In this paper, we propose Explicit Loss-error-aware Quantization (ELQ), a new method that can train DNN models with very low-bit parameter values such as ternary and binary ones to approximate 32-bit floating-point counterparts without noticeable loss of predication accuracy. Unlike existing methods that usually pose the problem as a straightforward approximation of the layer-wise weights or outputs of the original full-precision model (specifically, minimizing the error of the layer-wise weights or inner products of the weights and the inputs between the original and respective quantized models), our ELQ elaborately bridges the loss perturbation from the weight quantization and an incremental quantization strategy to address DNN quantization. Through explicitly regularizing the loss perturbation and the weight approximation error in an incremental way, we show that such a new optimization method is theoretically reasonable and practically effective. As validated with two mainstream convolutional neural network families (i.e., fully convolutional and non-fully convolutional), our ELQ shows better results than the state-of-the-art quantization methods on the large scale ImageNet classification dataset. Code will be made publicly available.

Tasks

Quantization

Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks

Abstract

Tasks

Reproductions