Computation-Efficient Quantization Method for Deep Neural Networks

2018-09-27Unverified0· sign in to hype

Parichay Kapoor, Dongsoo Lee, Byeongwook Kim, Saehyung Lee

Unverified — Be the first to reproduce this paper.

Abstract

Deep Neural Networks, being memory and computation intensive, are a challenge to deploy in smaller devices. Numerous quantization techniques have been proposed to reduce the inference latency/memory consumption. However, these techniques impose a large overhead on the training procedure or need to change the training process. We present a non-intrusive quantization technique based on re-training the full precision model, followed by directly optimizing the corresponding binary model. The quantization training process takes no longer than the original training process. We also propose a new loss function to regularize the weights, resulting in reduced quantization error. Combining both help us achieve full precision accuracy on CIFAR dataset using binary quantization. We also achieve full precision accuracy on WikiText-2 using 2 bit quantization. Comparable results are also shown for ImageNet. We also present a 1.5 bits hybrid model exceeding the performance of TWN LSTM model for WikiText-2.

Tasks

Quantization

Computation-Efficient Quantization Method for Deep Neural Networks

Abstract

Tasks

Reproductions