Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 895 papers

Title	Date	Tasks	Status	Hype
Leveraging Motion Information for Better Self-Supervised Video Correspondence Learning	Mar 15, 2025	ObjectSemantic Segmentation	—Unverified	0
Investigation of Frame Differences as Motion Cues for Video Object Segmentation	Mar 12, 2025	Optical Flow EstimationSegmentation	—Unverified	0
Open-World Skill Discovery from Unsegmented Demonstrations	Mar 11, 2025	Boundary DetectionEvent Segmentation	—Unverified	0
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation	Mar 10, 2025	Pseudo LabelSemantic Segmentation	—Unverified	0
Rethinking Few-Shot Medical Image Segmentation by SAM2: A Training-Free Framework with Augmentative Prompting and Dynamic Matching	Mar 5, 2025	Data AugmentationFew-Shot Learning	—Unverified	0
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	Mar 5, 2025	ObjectReferring Video Object Segmentation	CodeCode Available	2
Parameter-free Video Segmentation for Vision and Language Understanding	Mar 3, 2025	Question AnsweringVideo Question Answering	—Unverified	0
BST: Badminton Stroke-type Transformer for Skeleton-based Action Recognition in Racket Sports	Feb 28, 2025	Action RecognitionLine Detection	CodeCode Available	1
An Analysis of Data Transformation Effects on Segment Anything 2	Feb 25, 2025	Semantic SegmentationVideo Object Segmentation	—Unverified	0
Deep learning approaches to surgical video segmentation and object detection: A Scoping Review	Feb 23, 2025	object-detectionObject Detection	—Unverified	0
Pointmap Association and Piecewise-Plane Constraint for Consistent and Compact 3D Gaussian Segmentation Field	Feb 22, 2025	2D Panoptic Segmentation3D Scene Reconstruction	—Unverified	0
Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation	Feb 20, 2025	Video SegmentationVideo Semantic Segmentation	—Unverified	0
SASVi - Segment Any Surgical Video	Feb 12, 2025	SegmentationVideo Segmentation	CodeCode Available	1
Wandering around: A bioinspired approach to visual attention through object motion sensitivity	Feb 10, 2025	Low-latency processingMotion Segmentation	CodeCode Available	0
HD-EPIC: A Highly-Detailed Egocentric Video Dataset	Feb 6, 2025	Action RecognitionNutrition	—Unverified	0
Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors	Jan 27, 2025	Image MattingVideo Segmentation	—Unverified	0
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations	Jan 24, 2025	DecoderObject	—Unverified	0
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation	Jan 23, 2025	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1
Efficient Frame Extraction: A Novel Approach Through Frame Similarity and Surgical Tool Tracking for Video Segmentation	Jan 19, 2025	Video SegmentationVideo Semantic Segmentation	CodeCode Available	0
Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks	Jan 17, 2025	Few-Shot Semantic SegmentationSegmentation	CodeCode Available	1
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation	Jan 14, 2025	Objectobject-detection	CodeCode Available	1
EdgeTAM: On-Device Track Anything Model	Jan 13, 2025	modelVideo Segmentation	CodeCode Available	4
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning	Jan 12, 2025	Dense Video CaptioningVideo Captioning	CodeCode Available	1
Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation	Jan 12, 2025	Image RetrievalImage Segmentation	—Unverified	0
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation	Jan 9, 2025	Referring Video Object SegmentationSemantic Segmentation	CodeCode Available	0
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Jan 7, 2025	2kLanguage Modeling	CodeCode Available	5
Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy	Jan 6, 2025	Video SegmentationVideo Semantic Segmentation	CodeCode Available	0
EntitySAM: Segment Everything in Video	Jan 1, 2025	DecoderObject	—Unverified	0
Semantic and Sequential Alignment for Referring Video Object Segmentation	Jan 1, 2025	Instance SegmentationReferring Video Object Segmentation	—Unverified	0
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos	Jan 1, 2025	Large Language ModelVideo Segmentation	—Unverified	0
DTOS: Dynamic Time Object Sensing with Large Multimodal Model	Jan 1, 2025	Moment RetrievalReferring Video Object Segmentation	CodeCode Available	0
Decoupled Motion Expression Video Segmentation	Jan 1, 2025	Instance SegmentationReferring Video Object Segmentation	—Unverified	0
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver	Jan 1, 2025	Reasoning SegmentationSegmentation	CodeCode Available	2
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models	Jan 1, 2025	SegmentationSemantic Segmentation	—Unverified	0
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation	Dec 31, 2024	AllSegmentation	—Unverified	0
Generative Video Propagation	Dec 27, 2024	Image to Video GenerationVideo Generation	—Unverified	0
When SAM2 Meets Video Shadow and Mirror Detection	Dec 26, 2024	Image SegmentationMirror Detection	CodeCode Available	0
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation	Dec 18, 2024	ObjectSemantic Segmentation	CodeCode Available	1
Towards Open-Vocabulary Video Semantic Segmentation	Dec 12, 2024	SegmentationSemantic Segmentation	CodeCode Available	1
Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation	Dec 11, 2024	Autonomous DrivingContrastive Learning	—Unverified	0
Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation	Dec 11, 2024	Video SegmentationVideo Semantic Segmentation	—Unverified	0
Stable Mean Teacher for Semi-supervised Video Action Detection	Dec 10, 2024	Action DetectionSemantic Segmentation	CodeCode Available	0
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity	Dec 9, 2024	Anomaly Detectiontext annotation	CodeCode Available	2
Video Decomposition Prior: A Methodology to Decompose Videos into Layers	Dec 6, 2024	Semantic SegmentationVideo Editing	—Unverified	0
Referring Video Object Segmentation via Language-aligned Track Selection	Dec 2, 2024	ObjectObject Tracking	CodeCode Available	1
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Dec 2, 2024	In-Context LearningVideo Segmentation	CodeCode Available	3
Multi-Granularity Video Object Segmentation	Dec 2, 2024	ObjectSegmentation	CodeCode Available	1
Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation	Nov 28, 2024	3D ReconstructionSegmentation	—Unverified	0
Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2	Nov 28, 2024	Video SegmentationVideo Semantic Segmentation	CodeCode Available	2

Show:10 25 50

← PrevPage 2 of 18Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified