MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2024-07-10CVPR 2025Code Available7· sign in to hype

Ali Hatamizadeh, Jan Kautz

Code Available — Be the first to reproduce this paper.

Code

github.com/nvlabs/mambavision
OfficialIn paperpytorch★ 2,074
github.com/hashmatshadab/mambarobustness
pytorch★ 26
github.com/jiaowoguanren0615/MambaVision
pytorch★ 23

Abstract

We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. Through a comprehensive ablation study, we demonstrate the feasibility of integrating Vision Transformers (ViT) with Mamba. Our results show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies. Based on these findings, we introduce a family of MambaVision models with a hierarchical architecture to meet various design criteria. For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput. In downstream tasks such as object detection, instance segmentation, and semantic segmentation on MS COCO and ADE20K datasets, MambaVision outperforms comparably sized backbones while demonstrating favorable performance. Code: https://github.com/NVlabs/MambaVision

Tasks

Image Classification Instance Segmentation Mamba object-detection Object Detection Semantic Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet	MambaVision-L3	Top 1 Accuracy	88.1	—	Unverified
ImageNet	MambaVision-L	Top 1 Accuracy	85	—	Unverified
ImageNet	MambaVision-B	Top 1 Accuracy	84.2	—	Unverified
ImageNet	MambaVision-S	Top 1 Accuracy	83.3	—	Unverified
ImageNet	MambaVision-T2	Top 1 Accuracy	82.7	—	Unverified
ImageNet	MambaVision-T	Top 1 Accuracy	82.3	—	Unverified

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Code

Abstract

Tasks

Benchmark Results

Reproductions