| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 | 5 |
| Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | Jan 19, 2024 | Data AugmentationDepth Estimation | CodeCode Available | 9 | 5 |
| YOLO-World: Real-Time Open-Vocabulary Object Detection | Jan 30, 2024 | Instance SegmentationLanguage Modeling | CodeCode Available | 9 | 5 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 | 5 |
| MambaOut: Do We Really Need Mamba for Vision? | May 13, 2024 | image-classificationImage Classification | CodeCode Available | 7 | 5 |
| Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Dec 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 7 | 5 |
| MambaVision: A Hybrid Mamba-Transformer Vision Backbone | Jul 10, 2024 | Image ClassificationInstance Segmentation | CodeCode Available | 7 | 5 |
| Bilateral Reference for High-Resolution Dichotomous Image Segmentation | Jan 7, 2024 | Camouflaged Object SegmentationDichotomous Image Segmentation | CodeCode Available | 7 | 5 |
| U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation | Nov 29, 2023 | Computational EfficiencyDecoder | CodeCode Available | 6 | 5 |
| DINOv2: Learning Robust Visual Features without Supervision | Apr 14, 2023 | Depth EstimationDomain Generalization | CodeCode Available | 6 | 5 |
| Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | Jul 12, 2023 | FairnessImage Classification | CodeCode Available | 6 | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 | 5 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 | 5 |
| FeatUp: A Model-Agnostic Framework for Features at Any Resolution | Mar 15, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 5 | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 | 5 |
| Segment Anything | Apr 5, 2023 | Event-based Object SegmentationImage Segmentation | CodeCode Available | 5 | 5 |
| Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions | Jan 7, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 5 | 5 |
| 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Jun 13, 2024 | Instance Segmentationmultimodal generation | CodeCode Available | 5 | 5 |
| A ConvNet for the 2020s | Jan 10, 2022 | ClassificationDomain Generalization | CodeCode Available | 5 | 5 |
| Matching Anything by Segmenting Anything | Jun 6, 2024 | Domain GeneralizationMultiple Object Tracking | CodeCode Available | 5 | 5 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification | May 22, 2025 | 2D Semantic SegmentationActivity Prediction | CodeCode Available | 5 | 5 |
| PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Jun 16, 2024 | DecoderEarth Observation | CodeCode Available | 5 | 5 |
| OMG-Seg: Is One Model Good Enough For All Segmentation? | Jan 18, 2024 | AllDecoder | CodeCode Available | 5 | 5 |