Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 895 papers

Title	Date	Tasks	Status	Hype
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Jan 7, 2025	2kLanguage Modeling	CodeCode Available	5
Segment Anything Model for Zero-shot Single Particle Tracking in Liquid Phase Transmission Electron Microscopy	Jan 6, 2025	Video SegmentationVideo Semantic Segmentation	CodeCode Available	0
EntitySAM: Segment Everything in Video	Jan 1, 2025	DecoderObject	—Unverified	0
Semantic and Sequential Alignment for Referring Video Object Segmentation	Jan 1, 2025	Instance SegmentationReferring Video Object Segmentation	—Unverified	0
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos	Jan 1, 2025	Large Language ModelVideo Segmentation	—Unverified	0
DTOS: Dynamic Time Object Sensing with Large Multimodal Model	Jan 1, 2025	Moment RetrievalReferring Video Object Segmentation	CodeCode Available	0
Decoupled Motion Expression Video Segmentation	Jan 1, 2025	Instance SegmentationReferring Video Object Segmentation	—Unverified	0
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver	Jan 1, 2025	Reasoning SegmentationSegmentation	CodeCode Available	2
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models	Jan 1, 2025	SegmentationSemantic Segmentation	—Unverified	0
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation	Dec 31, 2024	AllSegmentation	—Unverified	0
Generative Video Propagation	Dec 27, 2024	Image to Video GenerationVideo Generation	—Unverified	0
When SAM2 Meets Video Shadow and Mirror Detection	Dec 26, 2024	Image SegmentationMirror Detection	CodeCode Available	0
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation	Dec 18, 2024	ObjectSemantic Segmentation	CodeCode Available	1
Towards Open-Vocabulary Video Semantic Segmentation	Dec 12, 2024	SegmentationSemantic Segmentation	CodeCode Available	1
Static-Dynamic Class-level Perception Consistency in Video Semantic Segmentation	Dec 11, 2024	Autonomous DrivingContrastive Learning	—Unverified	0
Collaborative Hybrid Propagator for Temporal Misalignment in Audio-Visual Segmentation	Dec 11, 2024	Video SegmentationVideo Semantic Segmentation	—Unverified	0
Stable Mean Teacher for Semi-supervised Video Action Detection	Dec 10, 2024	Action DetectionSemantic Segmentation	CodeCode Available	0
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity	Dec 9, 2024	Anomaly Detectiontext annotation	CodeCode Available	2
Video Decomposition Prior: A Methodology to Decompose Videos into Layers	Dec 6, 2024	Semantic SegmentationVideo Editing	—Unverified	0
Referring Video Object Segmentation via Language-aligned Track Selection	Dec 2, 2024	ObjectObject Tracking	CodeCode Available	1
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Dec 2, 2024	In-Context LearningVideo Segmentation	CodeCode Available	3
Multi-Granularity Video Object Segmentation	Dec 2, 2024	ObjectSegmentation	CodeCode Available	1
Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation	Nov 28, 2024	3D ReconstructionSegmentation	—Unverified	0
Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2	Nov 28, 2024	Video SegmentationVideo Semantic Segmentation	CodeCode Available	2

Show:10 25 50

← PrevPage 4 of 36Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified