| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 | 5 |
| Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks | Jan 25, 2024 | Segmentation | CodeCode Available | 9 | 5 |
| Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion | Feb 21, 2022 | BinarizationModel Optimization | CodeCode Available | 7 | 5 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 | 5 |
| Efficient MedSAMs: Segment Anything in Medical Images on Laptop | Dec 20, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 7 | 5 |
| Segment Anything in Medical Images and Videos: Benchmark and Deployment | Aug 6, 2024 | BenchmarkingSegmentation | CodeCode Available | 7 | 5 |
| U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation | Nov 29, 2023 | Computational EfficiencyDecoder | CodeCode Available | 6 | 5 |
| Track Anything: Segment Anything Meets Videos | Apr 24, 2023 | Image SegmentationObject Tracking | CodeCode Available | 5 | 5 |
| OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Jun 27, 2024 | DecoderSegmentation | CodeCode Available | 5 | 5 |
| OMG-Seg: Is One Model Good Enough For All Segmentation? | Jan 18, 2024 | AllDecoder | CodeCode Available | 5 | 5 |
| PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery | Jun 16, 2024 | DecoderEarth Observation | CodeCode Available | 5 | 5 |
| Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Aug 23, 2024 | Image SegmentationSegmentation | CodeCode Available | 5 | 5 |
| Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions | Jan 7, 2024 | BenchmarkingImage Segmentation | CodeCode Available | 5 | 5 |
| YOLOR-Based Multi-Task Learning | Sep 29, 2023 | Image CaptioningInstance Segmentation | CodeCode Available | 5 | 5 |
| FeatUp: A Model-Agnostic Framework for Features at Any Resolution | Mar 15, 2024 | Depth EstimationDepth Prediction | CodeCode Available | 5 | 5 |
| SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More | Aug 8, 2024 | Image SegmentationMedical Image Segmentation | CodeCode Available | 5 | 5 |
| Segment Anything | Apr 5, 2023 | Event-based Object SegmentationImage Segmentation | CodeCode Available | 5 | 5 |
| Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model | Jun 27, 2024 | MambaSegmentation | CodeCode Available | 5 | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 | 5 |
| Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | Jan 5, 2024 | image-classificationImage Classification | CodeCode Available | 5 | 5 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 | 5 |
| 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers | Oct 11, 2023 | DecoderImage Segmentation | CodeCode Available | 4 | 5 |
| Visual In-Context Prompting | Nov 22, 2023 | DecoderSegmentation | CodeCode Available | 4 | 5 |
| Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation | Feb 16, 2024 | Cardiac SegmentationDecoder | CodeCode Available | 4 | 5 |
| MedSAM2: Segment Anything in 3D Medical Images and Videos | Apr 4, 2025 | SegmentationVideo Segmentation | CodeCode Available | 4 | 5 |
| Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation | Jun 6, 2022 | Image SegmentationInstance Segmentation | CodeCode Available | 4 | 5 |
| No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation | Apr 5, 2024 | Few-Shot LearningScene Segmentation | CodeCode Available | 4 | 5 |
| Visual Attention Network | Feb 20, 2022 | image-classificationImage Classification | CodeCode Available | 4 | 5 |
| Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation | Mar 29, 2022 | Binary ClassificationSegmentation | CodeCode Available | 4 | 5 |
| TotalSegmentator: robust segmentation of 104 anatomical structures in CT images | Aug 11, 2022 | Segmentation | CodeCode Available | 4 | 5 |
| VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning | May 17, 2025 | 2D Object DetectionObject Counting | CodeCode Available | 4 | 5 |
| Semantic-SAM: Segment and Recognize Anything at Any Granularity | Jul 10, 2023 | Image SegmentationSegmentation | CodeCode Available | 4 | 5 |
| Medical SAM 2: Segment medical images as video via Segment Anything Model 2 | Aug 1, 2024 | Image SegmentationInteractive Segmentation | CodeCode Available | 4 | 5 |
| Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation | Feb 11, 2024 | Cardiac SegmentationContrastive Learning | CodeCode Available | 4 | 5 |
| Segment Anything in Medical Images | Apr 24, 2023 | DiagnosticImage Segmentation | CodeCode Available | 4 | 5 |
| Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation | Feb 7, 2024 | Cardiac SegmentationComputational Efficiency | CodeCode Available | 4 | 5 |
| Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | Oct 17, 2023 | Interactive SegmentationReferring Expression | CodeCode Available | 4 | 5 |
| SAMPart3D: Segment Any Part in 3D Objects | Nov 11, 2024 | 3D Generation3D Part Segmentation | CodeCode Available | 4 | 5 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 | 5 |
| Scalable 3D Panoptic Segmentation As Superpoint Graph Clustering | Jan 12, 2024 | 3D Panoptic Segmentation3D Semantic Segmentation | CodeCode Available | 4 | 5 |
| Highly Accurate Dichotomous Image Segmentation | Mar 6, 2022 | 2k3D Reconstruction | CodeCode Available | 4 | 5 |
| LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | Dec 28, 2023 | Instance SegmentationLanguage Modeling | CodeCode Available | 4 | 5 |
| SegGPT: Segmenting Everything In Context | Apr 6, 2023 | Few-Shot Semantic SegmentationIn-Context Learning | CodeCode Available | 4 | 5 |
| Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras | Jul 25, 2023 | Image SegmentationSegmentation | CodeCode Available | 4 | 5 |
| LISA: Reasoning Segmentation via Large Language Model | Aug 1, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 | 5 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 | 5 |
| Panoptic Feature Pyramid Networks | Jan 8, 2019 | Instance SegmentationPanoptic Segmentation | CodeCode Available | 4 | 5 |
| SiamMask: A Framework for Fast Online Object Tracking and Segmentation | Jul 5, 2022 | Multiple Object TrackingObject | CodeCode Available | 4 | 5 |
| Your ViT is Secretly an Image Segmentation Model | Mar 24, 2025 | DecoderImage Segmentation | CodeCode Available | 4 | 5 |