Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 895 papers

Title	Date	Tasks	Status	Hype
NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation	Jul 17, 2023	3D ReconstructionDepth Estimation	CodeCode Available	1
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation	Jul 3, 2023	Image SegmentationReferring Expression	CodeCode Available	1
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation	Jun 14, 2023	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1
3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW	Jun 4, 2023	PositionSegmentation	CodeCode Available	1
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	May 26, 2023	cross-modal alignmentObject	CodeCode Available	1
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation	May 25, 2023	ObjectReferring Expression Segmentation	CodeCode Available	1
UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model	May 22, 2023	Image SegmentationObject	CodeCode Available	1
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping	Apr 17, 2023	Motion SegmentationObject	CodeCode Available	1
Segment Everything Everywhere All at Once	Apr 13, 2023	AllDecoder	CodeCode Available	1
Boosting Video Object Segmentation via Space-time Correspondence Learning	Apr 13, 2023	ObjectSegmentation	CodeCode Available	1
DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks	Apr 2, 2023	DiversityObject Tracking	CodeCode Available	1
Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation	Mar 25, 2023	Semantic SegmentationVideo Object Segmentation	CodeCode Available	1
CrOC: Cross-View Online Clustering for Dense Visual Representation Learning	Mar 23, 2023	ClusteringOnline Clustering	CodeCode Available	1
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation	Mar 22, 2023	Contrastive LearningSegmentation	CodeCode Available	1
Two-shot Video Object Segmentation	Mar 21, 2023	ObjectPseudo Label	CodeCode Available	1
Adaptive Multi-source Predictor for Zero-shot Video Object Segmentation	Mar 18, 2023	ObjectOptical Flow Estimation	CodeCode Available	1
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation	Mar 16, 2023	Knowledge DistillationOpen Vocabulary Semantic Segmentation	CodeCode Available	1
Guided Slot Attention for Unsupervised Video Object Segmentation	Mar 15, 2023	ObjectSemantic Segmentation	CodeCode Available	1
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos	Mar 13, 2023	SegmentationSemantic Segmentation	CodeCode Available	1
Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation	Feb 22, 2023	DecoderImage Segmentation	CodeCode Available	1
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation	Feb 14, 2023	DecoderImage Segmentation	CodeCode Available	1
Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction	Feb 7, 2023	Instance SegmentationMulti-Object Tracking	CodeCode Available	1
TarViS: A Unified Approach for Target-based Video Segmentation	Jan 6, 2023	Instance SegmentationPanoptic Segmentation	CodeCode Available	1
End-to-End Video Matting With Trimap Propagation	Jan 1, 2023	Image MattingSegmentation	CodeCode Available	1

Show:10 25 50

← PrevPage 6 of 36Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified