| When SAM2 Meets Video Shadow and Mirror Detection | Dec 26, 2024 | Image SegmentationMirror Detection | CodeCode Available | 0 |
| InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models | Dec 18, 2024 | Reasoning SegmentationSegmentation | CodeCode Available | 2 |
| Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation | Dec 11, 2024 | Video SegmentationVideo Semantic Segmentation | —Unverified | 0 |
| Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity | Dec 9, 2024 | Anomaly Detectiontext annotation | CodeCode Available | 2 |
| Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes | Dec 2, 2024 | In-Context LearningVideo Segmentation | CodeCode Available | 3 |
| Multi-Granularity Video Object Segmentation | Dec 2, 2024 | ObjectSegmentation | CodeCode Available | 1 |
| Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2 | Nov 28, 2024 | Video SegmentationVideo Semantic Segmentation | CodeCode Available | 2 |
| Efficient Track Anything | Nov 28, 2024 | ObjectSegmentation | CodeCode Available | 7 |
| RoMo: Robust Motion Segmentation Improves Structure from Motion | Nov 27, 2024 | Camera CalibrationMotion Segmentation | —Unverified | 0 |
| SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation | Nov 26, 2024 | Natural Language UnderstandingReferring Video Object Segmentation | CodeCode Available | 3 |
| Geometric Algebra Planes: Convex Implicit Neural Volumes | Nov 20, 2024 | DecoderVideo Segmentation | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| Zero-shot capability of SAM-family models for bone segmentation in CT scans | Nov 13, 2024 | Image SegmentationMedical Image Segmentation | —Unverified | 0 |
| MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data | Nov 12, 2024 | SegmentationUncertainty Quantification | CodeCode Available | 0 |
| GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting | Nov 12, 2024 | 3DGSgraph construction | —Unverified | 0 |
| Breaking The Ice: Video Segmentation for Close-Range Ice-Covered Waters | Nov 7, 2024 | Image SegmentationOptical Flow Estimation | —Unverified | 0 |
| VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | Nov 7, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| SMITE: Segment Me In TimE | Oct 24, 2024 | SegmentationSemantic Segmentation | CodeCode Available | 3 |
| VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation | Oct 22, 2024 | SegmentationVideo Segmentation | CodeCode Available | 0 |
| SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree | Oct 21, 2024 | Heuristic SearchObject | CodeCode Available | 4 |
| Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation | Oct 17, 2024 | Multi-Object TrackingMulti-Object Tracking and Segmentation | —Unverified | 0 |
| Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation | Oct 16, 2024 | BenchmarkingPanoptic Segmentation | —Unverified | 0 |
| VideoSAM: Open-World Video Segmentation | Oct 11, 2024 | Autonomous DrivingDecoder | —Unverified | 0 |
| Shift and matching queries for video semantic segmentation | Oct 10, 2024 | Image SegmentationSegmentation | —Unverified | 0 |
| Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 | Sep 25, 2024 | ObjectObject Tracking | CodeCode Available | 5 |