P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

2024-05-30Code Available1· sign in to hype

Huihong Shi, Xin Cheng, Wendong Mao, Zhongfeng Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/shihuihong214/p2-vit
OfficialIn paperpytorch★ 10

Abstract

Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield non-negligible re-quantization overhead, limiting ViTs' hardware efficiency and motivating more hardware-friendly solutions. To this end, we propose P^2-ViT, the first Power-of-Two (PoT) post-training quantization and acceleration framework to accelerate fully quantized ViTs. Specifically, as for quantization, we explore a dedicated quantization scheme to effectively quantize ViTs with PoT scaling factors, thus minimizing the re-quantization overhead. Furthermore, we propose coarse-to-fine automatic mixed-precision quantization to enable better accuracy-efficiency trade-offs. In terms of hardware, we develop a dedicated chunk-based accelerator featuring multiple tailored sub-processors to individually handle ViTs' different types of operations, alleviating reconfigurable overhead. Additionally, we design a tailored row-stationary dataflow to seize the pipeline processing opportunity introduced by our PoT scaling factors, thereby enhancing throughput. Extensive experiments consistently validate P^2-ViT's effectiveness. Particularly, we offer comparable or even superior quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors. Besides, we achieve up to 10.1 speedup and 36.8 energy saving over GPU's Turing Tensor Cores, and up to 1.84 higher computation utilization efficiency against SOTA quantization-based ViT accelerators. Codes are available at https://github.com/shihuihong214/P2-ViT.

Tasks

Quantization

P^2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

Code

Abstract

Tasks

Reproductions