UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

2023-01-31Code Available1· sign in to hype

Dachuan Shi, Chaofan Tao, Ying Jin, Zhendong Yang, Chun Yuan, Jiaqi Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/sdc17/upop
OfficialIn paperpytorch★ 105
github.com/sdc17/crossget
pytorch★ 34

Abstract

Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, e.g., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the Unified and Progressive Pruning (UPop) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on various tasks, datasets, and model architectures demonstrate the effectiveness and versatility of the proposed UPop framework. The code is available at https://github.com/sdc17/UPop.

Tasks

Image Captioning Image Classification Image Retrieval Image-text Retrieval Image-to-Text Retrieval Model Compression Network Pruning Semantic Segmentation Text Retrieval Visual Question Answering (VQA)Visual Reasoning

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Code

Abstract

Tasks

Reproductions