McQueen : Mixed Precision Quantization of Early Exit Networks

2023-11-20British Machine Vision Conference (BMVC) 2023Code Available0· sign in to hype

Utkarsh Saxena; Kaushik Roy

Code Available — Be the first to reproduce this paper.

Code

github.com/UtkarshSaxena1/McQueen
pytorch★ 5

Abstract

Mixed precision quantization offers a promising way of obtaining the optimal tradeoff between model complexity and accuracy. However, most quantization techniques do not support input adaptive execution of neural networks resulting in a fixed computational cost for all the instances in a dataset. On the other hand, early exit networks augment traditional architectures with multiple exit classifiers and spend varied computational effort depending on dataset instance complexity, reducing the computational cost. In this work, we propose McQueen, a mixed precision quantization technique for early exit networks. Specifically, we develop a Parametric Differentiable Quantizer (PDQ) which learns the quantizer precision, threshold, and scaling factor during training. Further, we propose a gradient masking technique that facilitates the joint optimization of exit and final classifiers to learn PDQ and network parameters. Extensive experiments on a variety of datasets demonstrate that our method can achieve significant reduction in BitOperations (BOPs) while maintaining the top-1 accuracy of the original floating-point model. Specifically, McQueen is able to reduce BOPs by 109x compared to floatingpoint baseline without accuracy degradation on ResNet-18 trained on ImageNet.

Tasks

Quantization

McQueen : Mixed Precision Quantization of Early Exit Networks

Code

Abstract

Tasks

Reproductions