Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

2017-12-15CVPR 2018Code Available1· sign in to hype

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko

Code Available — Be the first to reproduce this paper.

Code

github.com/analogdevicesinc/ai8x-training
pytorch★ 115
github.com/MaximIntegratedAI/ai8x-synthesis
pytorch★ 64
github.com/hey-yahei/Quantization.MXNet
mxnet★ 0
github.com/Qengineering/TensorFlow_Lite_Classification_Jetson-Nano
tf★ 0
github.com/jameszampa/VIP-SoCET-Benchmark
tf★ 0
github.com/linyang-zhh/FQ-ViT
pytorch★ 0
github.com/PhilipPfeffer/haptic_vest
tf★ 0
github.com/Qengineering/TensorFlow_Lite_RPi_64-bits
tf★ 0
github.com/Janus-Shiau/awd-lstm-tensorflow
tf★ 0
github.com/jameszampa/ECE-570-Implementation
tf★ 0

Abstract

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

Tasks

General Classification Quantization

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Code

Abstract

Tasks

Reproductions