McQueen : Mixed Precision Quantization of Early Exit Networks
Utkarsh Saxena; Kaushik Roy
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/UtkarshSaxena1/McQueenpytorch★ 5
Abstract
Mixed precision quantization offers a promising way of obtaining the optimal tradeoff between model complexity and accuracy. However, most quantization techniques do not support input adaptive execution of neural networks resulting in a fixed computational cost for all the instances in a dataset. On the other hand, early exit networks augment traditional architectures with multiple exit classifiers and spend varied computational effort depending on dataset instance complexity, reducing the computational cost. In this work, we propose McQueen, a mixed precision quantization technique for early exit networks. Specifically, we develop a Parametric Differentiable Quantizer (PDQ) which learns the quantizer precision, threshold, and scaling factor during training. Further, we propose a gradient masking technique that facilitates the joint optimization of exit and final classifiers to learn PDQ and network parameters. Extensive experiments on a variety of datasets demonstrate that our method can achieve significant reduction in BitOperations (BOPs) while maintaining the top-1 accuracy of the original floating-point model. Specifically, McQueen is able to reduce BOPs by 109x compared to floatingpoint baseline without accuracy degradation on ResNet-18 trained on ImageNet.