FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/rwightman/pytorch-image-modelsOfficialIn paperpytorch★ 36,538
- github.com/apple/ml-fastvitOfficialIn paperpytorch★ 1,991
- github.com/balala8/FastViT_pytorchpytorch★ 26
- gitlab.com/birder/birderpytorch★ 0
- github.com/MindCode-4/code-1/tree/main/vit_hybridmindspore★ 0
- github.com/leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/fastvittf★ 0
Abstract
The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the state-of-the-art latency-accuracy trade-off. To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network. We further apply train-time overparametrization and large kernel convolutions to boost accuracy and empirically show that these choices have minimal effect on latency. We show that - our model is 3.5x faster than CMT, a recent state-of-the-art hybrid transformer architecture, 4.9x faster than EfficientNet, and 1.9x faster than ConvNeXt on a mobile device for the same accuracy on the ImageNet dataset. At similar latency, our model obtains 4.2% better Top-1 accuracy on ImageNet than MobileOne. Our model consistently outperforms competing architectures across several tasks -- image classification, detection, segmentation and 3D mesh regression with significant improvement in latency on both a mobile device and a desktop GPU. Furthermore, our model is highly robust to out-of-distribution samples and corruptions, improving over competing robust models. Code and models are available at https://github.com/apple/ml-fastvit.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| FreiHAND | FastViT-MA36 | PA-MPJPE | 6.6 | — | Unverified |