| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 |
| Segment Anything in Medical Images and Videos: Benchmark and Deployment | Aug 6, 2024 | BenchmarkingSegmentation | CodeCode Available | 7 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 5 |
| Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey | Aug 23, 2024 | Image SegmentationSegmentation | CodeCode Available | 5 |
| MedSAM2: Segment Anything in 3D Medical Images and Videos | Apr 4, 2025 | SegmentationVideo Segmentation | CodeCode Available | 4 |
| EdgeTAM: On-Device Track Anything Model | Jan 13, 2025 | modelVideo Segmentation | CodeCode Available | 4 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 |
| Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes | Dec 2, 2024 | In-Context LearningVideo Segmentation | CodeCode Available | 3 |
| SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation | Nov 26, 2024 | Natural Language UnderstandingReferring Video Object Segmentation | CodeCode Available | 3 |
| SMITE: Segment Me In TimE | Oct 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 3 |
| Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2 | Aug 3, 2024 | DiversitySegmentation | CodeCode Available | 3 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| UniVS: Unified and Universal Video Segmentation with Prompts as Queries | Feb 28, 2024 | DecoderReferring Expression Segmentation | CodeCode Available | 3 |
| RAP-SAM: Towards Real-Time All-Purpose Segment Anything | Jan 18, 2024 | AllDecoder | CodeCode Available | 3 |
| Tracking Anything with Decoupled Video Segmentation | Sep 7, 2023 | Open-Vocabulary Video SegmentationOpen-World Video Segmentation | CodeCode Available | 3 |
| VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation | Aug 28, 2023 | Instance SegmentationOptical Flow Estimation | CodeCode Available | 3 |
| Segment Anything Meets Point Tracking | Jul 3, 2023 | Interactive Video Object SegmentationObject | CodeCode Available | 3 |
| Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation | Mar 29, 2022 | Contrastive LearningSegmentation | CodeCode Available | 3 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver | Jan 1, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity | Dec 9, 2024 | Anomaly Detectiontext annotation | CodeCode Available | 2 |
| Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2 | Nov 28, 2024 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 2 |
| Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model | Sep 14, 2024 | Medical Image SegmentationPolyp Segmentation | CodeCode Available | 2 |
| Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning | Aug 15, 2024 | SegmentationVideo Segmentation | CodeCode Available | 2 |
| Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation | Apr 4, 2024 | Contrastive LearningReferring Expression | CodeCode Available | 2 |
| DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | Mar 29, 2024 | ObjectVideo Instance Segmentation | CodeCode Available | 2 |
| MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation | Jan 1, 2024 | SegmentationVideo Segmentation | CodeCode Available | 2 |
| MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | Aug 16, 2023 | Motion Expressions Guided Video SegmentationObject | CodeCode Available | 2 |
| XMem++: Production-level Video Segmentation From Few Annotated Frames | Jul 29, 2023 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| InstMove: Instance Motion for Object-centric Video Segmentation | Mar 14, 2023 | ObjectOptical Flow Estimation | CodeCode Available | 2 |
| Mask2Former for Video Instance Segmentation | Dec 20, 2021 | Image SegmentationInstance Segmentation | CodeCode Available | 2 |
| Simplifying Object Segmentation with PixelLib Library | Jan 20, 2021 | Image ClassificationInstance Segmentation | CodeCode Available | 2 |
| TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation | Oct 12, 2020 | Sign Language RecognitionSign Language Translation | CodeCode Available | 2 |
| Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder | Jun 28, 2025 | Image SegmentationLarge Language Model | CodeCode Available | 1 |
| SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost | Jun 2, 2025 | Image SegmentationSemantic Segmentation | CodeCode Available | 1 |
| Unlocking the Power of SAM 2 for Few-Shot Segmentation | May 20, 2025 | SegmentationVideo Segmentation | CodeCode Available | 1 |
| TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action | May 2, 2025 | Dense CaptioningHighlight Detection | CodeCode Available | 1 |
| DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Apr 16, 2025 | Few-Shot LearningInteractive Segmentation | CodeCode Available | 1 |
| CamSAM2: Segment Anything Accurately in Camouflaged Videos | Mar 25, 2025 | Camouflaged Object SegmentationObject | CodeCode Available | 1 |
| BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Feb 28, 2025 | Action RecognitionLine Detection | CodeCode Available | 1 |
| SASVi - Segment Any Surgical Video | Feb 12, 2025 | SegmentationVideo Segmentation | CodeCode Available | 1 |
| MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation | Jan 23, 2025 | Referring Expression SegmentationReferring Video Object Segmentation | CodeCode Available | 1 |
| Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks | Jan 17, 2025 | Few-Shot Semantic SegmentationSegmentation | CodeCode Available | 1 |
| VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning | Jan 12, 2025 | Dense Video CaptioningVideo Captioning | CodeCode Available | 1 |
| Multi-Granularity Video Object Segmentation | Dec 2, 2024 | ObjectSegmentation | CodeCode Available | 1 |