Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 895 papers

Title	Date	Tasks	Status	Hype
VideoMAC: Video Masked Autoencoders Meet ConvNets	Feb 29, 2024	Pose TrackingRepresentation Learning	CodeCode Available	1
Lester: rotoscope animation through video object segmentation and tracking	Feb 15, 2024	3D Human Pose EstimationObject	CodeCode Available	1
We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline	Feb 1, 2024	BenchmarkingDomain Adaptation	CodeCode Available	1
1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation	Jan 1, 2024	ObjectReferring Video Object Segmentation	CodeCode Available	1
Tracking with Human-Intent Reasoning	Dec 29, 2023	Language ModellingObject	CodeCode Available	1
DVIS++: Improved Decoupled Framework for Universal Video Segmentation	Dec 20, 2023	Contrastive LearningDenoising	CodeCode Available	1
AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform	Dec 17, 2023	Image SegmentationSegmentation	CodeCode Available	1
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning	Dec 1, 2023	Decoderobject-detection	CodeCode Available	1
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories	Nov 30, 2023	GPUObject	CodeCode Available	1
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation	Nov 29, 2023	ClusteringObject	CodeCode Available	1
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation	Nov 24, 2023	Meta-LearningOne-Shot Segmentation	CodeCode Available	1
Unified Domain Adaptive Semantic Segmentation	Nov 22, 2023	Data AugmentationOptical Flow Estimation	CodeCode Available	1
Concatenated Masked Autoencoders as Spatial-Temporal Learner	Nov 2, 2023	Action RecognitionData Augmentation	CodeCode Available	1
Mask Propagation for Efficient Video Semantic Segmentation	Oct 29, 2023	Semantic SegmentationVideo Semantic Segmentation	CodeCode Available	1
Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation	Sep 26, 2023	ObjectOptical Flow Estimation	CodeCode Available	1
MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography	Sep 24, 2023	Image SegmentationMedical Image Segmentation	CodeCode Available	1
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation	Sep 21, 2023	Autonomous DrivingSegmentation	CodeCode Available	1
GraphEcho: Graph-Driven Unsupervised Domain Adaptation for Echocardiogram Video Segmentation	Sep 20, 2023	Domain AdaptationGraph Matching	CodeCode Available	1
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation	Sep 18, 2023	Video SegmentationVideo Semantic Segmentation	CodeCode Available	1
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation	Aug 25, 2023	ObjectObject Tracking	CodeCode Available	1
LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark	Aug 18, 2023	DiversityPanoptic Segmentation	CodeCode Available	1
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation	Aug 13, 2023	Semantic SegmentationVideo Object Segmentation	CodeCode Available	1
Stochastic positional embeddings improve masked image modeling	Jul 31, 2023	Language ModellingMasked Language Modeling	CodeCode Available	1
Spectrum-guided Multi-granularity Referring Video Object Segmentation	Jul 25, 2023	ObjectReferring Expression Segmentation	CodeCode Available	1
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation	Jul 18, 2023	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1

Show:10 25 50

← PrevPage 5 of 36Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified