Oscillation-free Quantization for Low-bit Vision Transformers

2023-02-04Code Available1· sign in to hype

Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng

Code Available — Be the first to reproduce this paper.

Code

github.com/nbasyl/OFQ
OfficialIn paperpytorch★ 39

Abstract

Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used de facto setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in query and key of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization ( StatsQ) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ( CGA) that freezes the weights with high confidence and calms the oscillating weights; and query-key reparameterization ( QKR) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively. Code and models are available at: https://github.com/nbasyl/OFQ.

Tasks

Quantization

Oscillation-free Quantization for Low-bit Vision Transformers

Code

Abstract

Tasks

Reproductions