| Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation | Jul 13, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation | Jul 10, 2025 | NeRFObject | —Unverified | 0 |
| Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder | Jun 28, 2025 | Image SegmentationLarge Language Model | CodeCode Available | 1 |
| CogGen: A Learner-Centered Generative AI Architecture for Intelligent Tutoring with Programming Video | Jun 25, 2025 | Knowledge TracingVideo Segmentation | —Unverified | 0 |
| Leader360V: The Large-scale, Real-world 360 Video Dataset for Multi-task Learning in Diverse Environment | Jun 17, 2025 | Autonomous DrivingInstance Segmentation | —Unverified | 0 |
| A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects | Jun 16, 2025 | BenchmarkingInstance Segmentation | —Unverified | 0 |
| Q-SAM2: Accurate Quantization for Segment Anything Model 2 | Jun 11, 2025 | QuantizationVideo Segmentation | —Unverified | 0 |
| SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost | Jun 2, 2025 | Image SegmentationSemantic Segmentation | CodeCode Available | 1 |
| OmniFall: A Unified Staged-to-Wild Benchmark for Human Fall Detection | May 26, 2025 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 0 |
| ThinkVideo: High-Quality Reasoning Video Segmentation with Chain of Thoughts | May 24, 2025 | Image SegmentationInstance Segmentation | CodeCode Available | 0 |
| Unlocking the Power of SAM 2 for Few-Shot Segmentation | May 20, 2025 | SegmentationVideo Segmentation | CodeCode Available | 1 |
| FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching | May 19, 2025 | Instance SegmentationSegmentation | —Unverified | 0 |
| VolE: A Point-cloud Framework for Food 3D Reconstruction and Volume Estimation | May 15, 2025 | 3D ReconstructionCamera Calibration | —Unverified | 0 |
| TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action | May 2, 2025 | Dense CaptioningHighlight Detection | CodeCode Available | 1 |
| DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency | Apr 16, 2025 | Few-Shot LearningInteractive Segmentation | CodeCode Available | 1 |
| PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild | Apr 15, 2025 | SegmentationSemantic Segmentation | —Unverified | 0 |
| GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation | Apr 10, 2025 | Contrastive LearningLanguage Modeling | CodeCode Available | 2 |
| The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation | Apr 7, 2025 | Inference OptimizationReferring Video Object Segmentation | CodeCode Available | 5 |
| MedSAM2: Segment Anything in 3D Medical Images and Videos | Apr 4, 2025 | SegmentationVideo Segmentation | CodeCode Available | 4 |
| Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation | Mar 27, 2025 | Binary ClassificationVideo Segmentation | —Unverified | 0 |
| Online Reasoning Video Segmentation with Just-in-Time Digital Twins | Mar 27, 2025 | Reasoning SegmentationVideo Segmentation | —Unverified | 0 |
| CamSAM2: Segment Anything Accurately in Camouflaged Videos | Mar 25, 2025 | Camouflaged Object SegmentationObject | CodeCode Available | 1 |
| Reducing Annotation Burden: Exploiting Image Knowledge for Few-Shot Medical Video Object Segmentation via Spatiotemporal Consistency Relearning | Mar 19, 2025 | SegmentationSemantic Segmentation | CodeCode Available | 0 |
| Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking | Mar 18, 2025 | DescriptiveInstance Segmentation | CodeCode Available | 0 |
| SAM2 for Image and Video Segmentation: A Comprehensive Survey | Mar 17, 2025 | Autonomous DrivingImage Segmentation | —Unverified | 0 |
| Open-World Skill Discovery from Unsegmented Demonstrations | Mar 11, 2025 | Boundary DetectionEvent Segmentation | —Unverified | 0 |
| OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation | Mar 10, 2025 | Pseudo LabelSemantic Segmentation | —Unverified | 0 |
| Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching | Mar 5, 2025 | Data AugmentationFew-Shot Learning | —Unverified | 0 |
| Parameter-free Video Segmentation for Vision and Language Understanding | Mar 3, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 |
| BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports | Feb 28, 2025 | Action RecognitionLine Detection | CodeCode Available | 1 |
| An Analysis of Data Transformation Effects on Segment Anything 2 | Feb 25, 2025 | Semantic SegmentationVideo Object Segmentation | —Unverified | 0 |
| Deep learning approaches to surgical video segmentation and object detection: A Scoping Review | Feb 23, 2025 | object-detectionObject Detection | —Unverified | 0 |
| Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field | Feb 22, 2025 | 2D Panoptic Segmentation3D Scene Reconstruction | —Unverified | 0 |
| Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation | Feb 20, 2025 | Video SegmentationVideo Semantic Segmentation | —Unverified | 0 |
| SASVi - Segment Any Surgical Video | Feb 12, 2025 | SegmentationVideo Segmentation | CodeCode Available | 1 |
| Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors | Jan 27, 2025 | Image MattingVideo Segmentation | —Unverified | 0 |
| MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation | Jan 23, 2025 | Referring Expression SegmentationReferring Video Object Segmentation | CodeCode Available | 1 |
| Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation | Jan 19, 2025 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 0 |
| Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks | Jan 17, 2025 | Few-Shot Semantic SegmentationSegmentation | CodeCode Available | 1 |
| EdgeTAM: On-Device Track Anything Model | Jan 13, 2025 | modelVideo Segmentation | CodeCode Available | 4 |
| VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning | Jan 12, 2025 | Dense Video CaptioningVideo Captioning | CodeCode Available | 1 |
| Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation | Jan 12, 2025 | Image RetrievalImage Segmentation | —Unverified | 0 |
| Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | Jan 7, 2025 | 2kLanguage Modeling | CodeCode Available | 5 |
| Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy | Jan 6, 2025 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 0 |
| EntitySAM: Segment Everything in Video | Jan 1, 2025 | DecoderObject | —Unverified | 0 |
| Decoupled Motion Expression Video Segmentation | Jan 1, 2025 | Instance SegmentationReferring Video Object Segmentation | —Unverified | 0 |
| VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Jan 1, 2025 | Large Language ModelVideo Segmentation | —Unverified | 0 |
| HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver | Jan 1, 2025 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation | Dec 31, 2024 | AllSegmentation | —Unverified | 0 |
| Generative Video Propagation | Dec 27, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |