Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 895 papers

Title	Date	Tasks	Status	Hype	Score
DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks	Apr 2, 2023	DiversityObject Tracking	CodeCode Available	1	5
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Fast Template Matching and Update for Video Object Tracking and Segmentation	Apr 16, 2020	Object Trackingreinforcement-learning	CodeCode Available	1	5
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation	Jun 9, 2021	Semantic SegmentationSemi-Supervised Video Object Segmentation	CodeCode Available	1	5
A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation	Sep 19, 2022	Clusteringglobal-optimization	CodeCode Available	1	5
DVIS++: Improved Decoupled Framework for Universal Video Segmentation	Dec 20, 2023	Contrastive LearningDenoising	CodeCode Available	1	5
Local-Global Context Aware Transformer for Language-Guided Video Segmentation	Mar 18, 2022	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1	5
Lester: rotoscope animation through video object segmentation and tracking	Feb 15, 2024	3D Human Pose EstimationObject	CodeCode Available	1	5
Motion-Attentive Transition for Zero-Shot Video Object Segmentation	Mar 9, 2020	DecoderObject	CodeCode Available	1	5
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation	Jan 23, 2025	Referring Expression SegmentationReferring Video Object Segmentation	CodeCode Available	1	5
Context-Aware Relative Object Queries To Unify Video Instance and Panoptic Segmentation	Jan 1, 2023	Instance SegmentationMulti-Object Tracking	CodeCode Available	1	5
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation	Apr 6, 2022	Optical Flow EstimationReferring Expression Segmentation	CodeCode Available	1	5
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion	Mar 14, 2021	Interactive Video Object SegmentationSemantic Segmentation	CodeCode Available	1	5
See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks	Jan 19, 2020	Semantic SegmentationUnsupervised Video Object Segmentation	CodeCode Available	1	5
Concatenated Masked Autoencoders as Spatial-Temporal Learner	Nov 2, 2023	Action RecognitionData Augmentation	CodeCode Available	1	5
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning	Dec 1, 2023	Decoderobject-detection	CodeCode Available	1	5
Fast Video Object Segmentation using the Global Context Module	Jan 30, 2020	ObjectSegmentation	CodeCode Available	1	5
Efficient Regional Memory Network for Video Object Segmentation	Mar 24, 2021	ObjectOne-shot visual object segmentation	CodeCode Available	1	5
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos	Mar 13, 2023	SegmentationSemantic Segmentation	CodeCode Available	1	5
Efficient Semantic Video Segmentation with Per-frame Inference	Feb 26, 2020	Knowledge DistillationOptical Flow Estimation	CodeCode Available	1	5
RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation	Mar 8, 2022	ClassificationInstance Segmentation	CodeCode Available	1	5
CamSAM2: Segment Anything Accurately in Camouflaged Videos	Mar 25, 2025	Camouflaged Object SegmentationObject	CodeCode Available	1	5
Multi-Attention Network for Compressed Video Referring Object Segmentation	Jul 26, 2022	ObjectReferring Expression Segmentation	CodeCode Available	1	5
Exploiting Temporal State Space Sharing for Video Semantic Segmentation	Mar 26, 2025	MambaSemantic Segmentation	CodeCode Available	1	5
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration	Oct 13, 2020	ObjectOne-shot visual object segmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 9 of 36Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified