| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 |
| Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | Jan 19, 2024 | Data AugmentationDepth Estimation | CodeCode Available | 9 |
| Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Dec 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 7 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 |
| MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Jul 10, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 7 |
| MambaOut: Do We Really Need Mamba for Vision? | May 13, 2024 | image-classificationImage Classification | CodeCode Available | 7 |
| Bilateral Reference for High-Resolution Dichotomous Image Segmentation | Jan 7, 2024 | Camouflaged Object SegmentationDichotomous Image Segmentation | CodeCode Available | 7 |
| U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation | Nov 29, 2023 | Computational EfficiencyDecoder | CodeCode Available | 6 |
| Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Jul 12, 2023 | FairnessImage Classification | CodeCode Available | 6 |
| DINOv2: Learning Robust Visual Features without Supervision | Apr 14, 2023 | Depth EstimationDomain Generalization | CodeCode Available | 6 |
| NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification | May 22, 2025 | 2D Semantic SegmentationActivity Prediction | CodeCode Available | 5 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Aug 23, 2024 | Image SegmentationSegmentation | CodeCode Available | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Jun 16, 2024 | DecoderEarth Observation | CodeCode Available | 5 |
| 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Jun 13, 2024 | Instance Segmentationmultimodal generation | CodeCode Available | 5 |
| Matching Anything by Segmenting Anything | Jun 6, 2024 | Domain GeneralizationMultiple Object Tracking | CodeCode Available | 5 |
| FeatUp: A Model-Agnostic Framework for Features at Any Resolution | Mar 15, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 5 |
| OMG-Seg: Is One Model Good Enough For All Segmentation? | Jan 18, 2024 | AllDecoder | CodeCode Available | 5 |
| Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions | Jan 7, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 5 |
| YOLOR-Based Multi-Task Learning | Sep 29, 2023 | Image CaptioningInstance Segmentation | CodeCode Available | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 |
| Infinite Photorealistic Worlds using Procedural Generation | Jun 15, 2023 | 3D Reconstructionobject-detection | CodeCode Available | 5 |
| Track Anything: Segment Anything Meets Videos | Apr 24, 2023 | Image SegmentationObject Tracking | CodeCode Available | 5 |
| Segment Anything | Apr 5, 2023 | Event-based Object SegmentationImage Segmentation | CodeCode Available | 5 |
| A ConvNet for the 2020s | Jan 10, 2022 | ClassificationDomain Generalization | CodeCode Available | 5 |
| Attention on the Sphere | May 16, 2025 | Depth EstimationImage Segmentation | CodeCode Available | 4 |
| Your ViT is Secretly an Image Segmentation Model | Mar 24, 2025 | DecoderImage Segmentation | CodeCode Available | 4 |
| Sonata: Self-Supervised Learning of Reliable Point Representations | Mar 20, 2025 | 3D Semantic SegmentationSelf-Supervised Learning | CodeCode Available | 4 |
| OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels | Feb 27, 2025 | Image ClassificationInstance Segmentation | CodeCode Available | 4 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| EmbodiedSAM: Online Segment Any 3D Thing in Real Time | Aug 21, 2024 | 3D Instance SegmentationGPU | CodeCode Available | 4 |
| Medical SAM 2: Segment medical images as video via Segment Anything Model 2 | Aug 1, 2024 | Image SegmentationInteractive Segmentation | CodeCode Available | 4 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 |
| LSKNet: A Foundation Lightweight Backbone for Remote Sensing | Mar 18, 2024 | Change Detectionobject-detection | CodeCode Available | 4 |
| Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation | Feb 16, 2024 | Cardiac SegmentationDecoder | CodeCode Available | 4 |
| Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation | Feb 11, 2024 | Cardiac SegmentationContrastive Learning | CodeCode Available | 4 |
| Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation | Feb 7, 2024 | Cardiac SegmentationComputational Efficiency | CodeCode Available | 4 |
| InstanceDiffusion: Instance-level Control for Image Generation | Feb 5, 2024 | Conditional Text-to-Image SynthesisImage Generation | CodeCode Available | 4 |
| SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation | Jan 24, 2024 | Image SegmentationMamba | CodeCode Available | 4 |
| Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering | Jan 12, 2024 | 3D Panoptic Segmentation3D Semantic Segmentation | CodeCode Available | 4 |
| Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications | Jan 11, 2024 | image-classificationImage Classification | CodeCode Available | 4 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 |
| EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything | Dec 1, 2023 | Decoderimage-classification | CodeCode Available | 4 |
| LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | Nov 1, 2023 | AllImage Generation | CodeCode Available | 4 |
| 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers | Oct 11, 2023 | DecoderImage Segmentation | CodeCode Available | 4 |