| SAM 2: Segment Anything in Images and Videos | Aug 1, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 11 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| OMG-Seg: Is One Model Good Enough For All Segmentation? | Jan 18, 2024 | AllDecoder | CodeCode Available | 5 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Jun 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 4 |
| SegGPT: Segmenting Everything In Context | Apr 6, 2023 | Few-Shot Semantic SegmentationIn-Context Learning | CodeCode Available | 4 |
| SiamMask: A Framework for Fast Online Object Tracking and Segmentation | Jul 5, 2022 | Multiple Object TrackingObject | CodeCode Available | 4 |
| SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation | Nov 26, 2024 | Natural Language UnderstandingReferring Video Object Segmentation | CodeCode Available | 3 |
| SMITE: Segment Me In TimE | Oct 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 3 |
| VISA: Reasoning Video Object Segmentation via Large Language Models | Jul 16, 2024 | DecoderObject | CodeCode Available | 3 |
| Moving Object Segmentation: All You Need Is SAM (and Flow) | Apr 18, 2024 | AllMotion Segmentation | CodeCode Available | 3 |
| PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | Mar 21, 2024 | DecoderGeneralized Referring Expression Segmentation | CodeCode Available | 3 |
| UniVS: Unified and Universal Video Segmentation with Prompts as Queries | Feb 28, 2024 | DecoderReferring Expression Segmentation | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| Putting the Object Back into Video Object Segmentation | Oct 19, 2023 | ObjectSegmentation | CodeCode Available | 3 |
| Tracking Anything with Decoupled Video Segmentation | Sep 7, 2023 | Open-Vocabulary Video SegmentationOpen-World Video Segmentation | CodeCode Available | 3 |
| Segment Anything Meets Point Tracking | Jul 3, 2023 | Interactive Video Object SegmentationObject | CodeCode Available | 3 |
| Personalize Segment Anything Model with One Shot | May 4, 2023 | Image Generationmodel | CodeCode Available | 3 |
| XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | Jul 14, 2022 | 2D Human Pose Estimation2D Object Detection | CodeCode Available | 3 |
| VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Jun 5, 2025 | Autonomous DrivingAutonomous Navigation | CodeCode Available | 2 |
| Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration | May 26, 2025 | Domain GeneralizationHallucination | CodeCode Available | 2 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation | Mar 5, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 2 |
| HyperSeg: Towards Universal Visual Segmentation with Large Language Model | Nov 26, 2024 | Language ModelingLarge Language Model | CodeCode Available | 2 |
| IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos | Nov 18, 2024 | Pose EstimationSemantic Segmentation | CodeCode Available | 2 |
| One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | Sep 29, 2024 | AllImage Segmentation | CodeCode Available | 2 |
| Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation | Aug 28, 2024 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation | Apr 30, 2024 | AttributeSemantic Segmentation | CodeCode Available | 2 |
| Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation | Apr 21, 2024 | Semantic SegmentationVideo Object Segmentation | CodeCode Available | 2 |
| Efficient Video Object Segmentation via Modulated Cross-Attention Memory | Mar 26, 2024 | GPUObject | CodeCode Available | 2 |
| Vivim: a Video Vision Mamba for Medical Video Segmentation | Jan 25, 2024 | Lesion SegmentationMamba | CodeCode Available | 2 |
| UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces | Dec 25, 2023 | Image SegmentationObject | CodeCode Available | 2 |
| MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | Aug 16, 2023 | Motion Expressions Guided Video SegmentationObject | CodeCode Available | 2 |
| XMem++: Production-level Video Segmentation From Few Annotated Frames | Jul 29, 2023 | SegmentationSemantic Segmentation | CodeCode Available | 2 |
| Tracking Anything in High Quality | Jul 26, 2023 | ObjectObject Tracking | CodeCode Available | 2 |
| Video Object Segmentation in Panoptic Wild Scenes | May 8, 2023 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| MOSE: A New Dataset for Video Object Segmentation in Complex Scenes | Feb 3, 2023 | ObjectSegmentation | CodeCode Available | 2 |
| VLT: Vision-Language Transformer and Query Generation for Referring Segmentation | Oct 28, 2022 | Referring Expression SegmentationReferring Video Object Segmentation | CodeCode Available | 2 |
| Decoupling Features in Hierarchical Propagation for Video Object Segmentation | Oct 18, 2022 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| In Defense of Online Models for Video Instance Segmentation | Jul 21, 2022 | Contrastive LearningInstance Segmentation | CodeCode Available | 2 |
| Video Polyp Segmentation: A Deep Learning Perspective | Mar 27, 2022 | AttributeDeep Learning | CodeCode Available | 2 |
| Scalable Video Object Segmentation with Identification Mechanism | Mar 22, 2022 | ObjectSegmentation | CodeCode Available | 2 |
| Language as Queries for Referring Video Object Segmentation | Jan 3, 2022 | ObjectObject Tracking | CodeCode Available | 2 |
| Fast Online Object Tracking and Segmentation: A Unifying Approach | Dec 12, 2018 | ObjectObject Tracking | CodeCode Available | 2 |
| M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation | Jun 15, 2025 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| Video-GPT via Next Clip Diffusion | May 18, 2025 | DenoisingImage Animation | CodeCode Available | 1 |
| Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2 | May 3, 2025 | Computed Tomography (CT)Semantic Segmentation | CodeCode Available | 1 |