VMamba: Visual State Space Model

2024-01-18Code Available7· sign in to hype

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, YaoWei Wang, Qixiang Ye, Jianbin Jiao, Yunfan Liu

Code Available — Be the first to reproduce this paper.

Code

github.com/mzeromiko/vmamba
OfficialIn paperpytorch★ 3,080
github.com/chenhongruixuan/mambacd
none★ 575
github.com/hunto/localmamba
pytorch★ 278
github.com/yuhengsss/msvmamba
pytorch★ 80
github.com/raytrun/mamba-clip
pytorch★ 80
github.com/zs1314/skinmamba
pytorch★ 51
github.com/weitunglin/pixmamba
pytorch★ 40
github.com/longshaocong/dgmamba
pytorch★ 37
github.com/zs1314/octamamba
pytorch★ 34
github.com/AmazingDay1/TAME
pytorch★ 26

Abstract

Designing computationally efficient network architectures remains an ongoing necessity in computer vision. In this paper, we adapt Mamba, a state-space language model, into VMamba, a vision backbone with linear time complexity. At the core of VMamba is a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module. By traversing along four scanning routes, SS2D bridges the gap between the ordered nature of 1D selective scan and the non-sequential structure of 2D vision data, which facilitates the collection of contextual information from various sources and perspectives. Based on the VSS blocks, we develop a family of VMamba architectures and accelerate them through a succession of architectural and implementation enhancements. Extensive experiments demonstrate VMamba's promising performance across diverse visual perception tasks, highlighting its superior input scaling efficiency compared to existing benchmark models. Source code is available at https://github.com/MzeroMiko/VMamba.

Tasks

Computational Efficiency Language Modeling Language Modelling Mamba model Representation Learning

VMamba: Visual State Space Model

Code

Abstract

Tasks

Reproductions