| SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction | Jul 21, 2025 | ObjectSegmentation | —Unverified | 0 |
| Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation | Jul 13, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation | Jul 10, 2025 | NeRFObject | —Unverified | 0 |
| M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation | Jun 15, 2025 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation | Jun 7, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| VideoMolmo: Spatio-Temporal Grounding Meets Pointing | Jun 5, 2025 | Autonomous DrivingAutonomous Navigation | CodeCode Available | 2 |
| InterRVOS: Interaction-aware Referring Video Object Segmentation | Jun 3, 2025 | 8kObject | —Unverified | 0 |
| Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration | May 26, 2025 | Domain GeneralizationHallucination | CodeCode Available | 2 |
| ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts | May 24, 2025 | Image SegmentationInstance Segmentation | CodeCode Available | 0 |
| Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation | May 19, 2025 | Referring Video Object SegmentationSemantic Segmentation | —Unverified | 0 |
| Video-GPT via Next Clip Diffusion | May 18, 2025 | DenoisingImage Animation | CodeCode Available | 1 |
| 6D Pose Estimation on Spoons and Hands | May 5, 2025 | 6D Pose EstimationPose Estimation | —Unverified | 0 |
| Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2 | May 3, 2025 | Computed Tomography (CT)Semantic Segmentation | CodeCode Available | 1 |
| MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection | Apr 30, 2025 | Instance SegmentationInteractive Segmentation | —Unverified | 0 |
| RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory | Apr 23, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| Few-Shot Referring Video Single- and Multi-Object Segmentation via Cross-Modal Affinity with Instance Sequence Matching | Apr 18, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 0 |
| DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Apr 16, 2025 | Few-Shot LearningInteractive Segmentation | CodeCode Available | 1 |
| PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild | Apr 15, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| MASSeg : 2nd Technical Report for 4th PVUW MOSE Track | Apr 14, 2025 | Data AugmentationObject | CodeCode Available | 0 |
| FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution | Apr 13, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge | Apr 11, 2025 | Semantic SegmentationVideo Object Segmentation | —Unverified | 0 |
| Multi-person Physics-based Pose Estimation for Combat Sports | Apr 11, 2025 | 3D Human Pose Estimation3D Multi-Person Pose Estimation | —Unverified | 0 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| Saliency-Motion Guided Trunk-Collateral Network for Unsupervised Video Object Segmentation | Apr 8, 2025 | Optical Flow EstimationSalient Object Detection | —Unverified | 0 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| Zero-Shot 4D Lidar Panoptic Segmentation | Apr 1, 2025 | DiversityPanoptic Segmentation | —Unverified | 0 |
| CamoSAM2: Motion-Appearance Induced Auto-Refining Prompts for Video Camouflaged Object Detection | Apr 1, 2025 | Camouflaged Object Segmentationobject-detection | —Unverified | 0 |
| 4th PVUW MeViS 3rd Place Report: Sa2VA | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025 | Mar 30, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 0 |
| One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks | Mar 19, 2025 | DecoderSegmentation | CodeCode Available | 0 |
| Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning | Mar 19, 2025 | SegmentationSemantic Segmentation | CodeCode Available | 0 |
| AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations | Mar 17, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| Leveraging Motion Information for Better Self-Supervised Video Correspondence Learning | Mar 15, 2025 | ObjectSemantic Segmentation | —Unverified | 0 |
| Investigation of Frame Differences as Motion Cues for Video Object Segmentation | Mar 12, 2025 | Optical Flow EstimationSegmentation | —Unverified | 0 |
| Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation | Mar 5, 2025 | ObjectReferring Video Object Segmentation | CodeCode Available | 2 |
| An Analysis of Data Transformation Effects on Segment Anything 2 | Feb 25, 2025 | Semantic SegmentationVideo Object Segmentation | —Unverified | 0 |
| Wandering around: A bioinspired approach to visual attention through object motion sensitivity | Feb 10, 2025 | Low-latency processingMotion Segmentation | CodeCode Available | 0 |
| HD-EPIC: A Highly-Detailed Egocentric Video Dataset | Feb 6, 2025 | Action RecognitionNutrition | —Unverified | 0 |
| ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations | Jan 24, 2025 | DecoderObject | —Unverified | 0 |
| MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation | Jan 23, 2025 | Referring Expression SegmentationReferring Video Object Segmentation | CodeCode Available | 1 |
| Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation | Jan 14, 2025 | Objectobject-detection | CodeCode Available | 1 |
| Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation | Jan 9, 2025 | Referring Video Object SegmentationSemantic Segmentation | CodeCode Available | 0 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Decoupled Motion Expression Video Segmentation | Jan 1, 2025 | Instance SegmentationReferring Video Object Segmentation | —Unverified | 0 |
| DTOS: Dynamic Time Object Sensing with Large Multimodal Model | Jan 1, 2025 | Moment RetrievalReferring Video Object Segmentation | CodeCode Available | 0 |
| Semantic and Sequential Alignment for Referring Video Object Segmentation | Jan 1, 2025 | Instance SegmentationReferring Video Object Segmentation | —Unverified | 0 |
| M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation | Dec 18, 2024 | ObjectSemantic Segmentation | CodeCode Available | 1 |
| Stable Mean Teacher for Semi-supervised Video Action Detection | Dec 10, 2024 | Action DetectionSemantic Segmentation | CodeCode Available | 0 |
| Video Decomposition Prior: A Methodology to Decompose Videos into Layers | Dec 6, 2024 | Semantic SegmentationVideo Editing | —Unverified | 0 |
| Referring Video Object Segmentation via Language-aligned Track Selection | Dec 2, 2024 | ObjectObject Tracking | CodeCode Available | 1 |