DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge

2025-01-01CVPR 2025Code Available0· sign in to hype

Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, Adnan Siraj Rakin

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/ml-security-research-lab/deepcompress-vit
OfficialIn paperpytorch★ 6

Abstract

Vision Transformers (ViTs) excel in tackling complex vision tasks, yet their substantial size poses significant challenges for applications on resource-constrained edge devices. The increased size of these models leads to higher overhead (e.g., energy, latency) when transmitting model weights between the edge device and the server. Hence, ViTs are not ideal for edge devices where the entire model may not fit on the device. Current model compression techniques often achieve high compression ratios at the expense of performance degradation, particularly for ViTs. To overcome the limitations of existing works, we rethink model compression strategy for ViTs from first principle approach and develop an orthogonal strategy called . The objective of the is to encode the model weights to a highly compressed encoded representation using a novel training method, denoted as Unified Compression Training (UCT). Proposed UCT is accompanied by a decoding mechanism during inference, which helps to gain any loss of accuracy due to high compression ratio. We further optimize this decoding step by re-ordering the decoding operation using associative property of matrix multiplication, ensuring that the compressed weights can be decoded during inference without incurring any computational overhead. Our extensive experiments across multiple ViT models on modern edge devices show that can successfully compress ViTs at high compression ratios (>14x). enables the entire model to be stored on edge device, resulting in unprecedented reductions in energy consumption (>1470x) and latency (>68x) for edge ViT inference. Our code is available at https://github.com/ML-Security-Research-LAB/DeepCompress-ViT https://github.com/ML-Security-Research-LAB/DeepCompress-ViT .

Tasks

Model Compression

DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge

Code

Abstract

Tasks

Reproductions