Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 895 papers

Title	Date	Tasks	Status	Hype
SAM 2: Segment Anything in Images and Videos	Aug 1, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	12
Efficient Track Anything	Nov 28, 2024	ObjectSegmentation	CodeCode Available	7
Segment Anything in Medical Images and Videos: Benchmark and Deployment	Aug 6, 2024	BenchmarkingSegmentation	CodeCode Available	7
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation	Apr 7, 2025	Inference OptimizationReferring Video Object Segmentation	CodeCode Available	5
4th PVUW MeViS 3rd Place Report: Sa2VA	Apr 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Jan 7, 2025	2kLanguage Modeling	CodeCode Available	5
Underwater Camouflaged Object Tracking Meets Vision-Language SAM2	Sep 25, 2024	ObjectObject Tracking	CodeCode Available	5
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey	Aug 23, 2024	Image SegmentationSegmentation	CodeCode Available	5
OMG-Seg: Is One Model Good Enough For All Segmentation?	Jan 18, 2024	AllDecoder	CodeCode Available	5
MedSAM2: Segment Anything in 3D Medical Images and Videos	Apr 4, 2025	SegmentationVideo Segmentation	CodeCode Available	4
EdgeTAM: On-Device Track Anything Model	Jan 13, 2025	modelVideo Segmentation	CodeCode Available	4
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree	Oct 21, 2024	Heuristic SearchObject	CodeCode Available	4
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Jun 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	4
SegGPT: Segmenting Everything In Context	Apr 6, 2023	Few-Shot Semantic SegmentationIn-Context Learning	CodeCode Available	4
SiamMask: A Framework for Fast Online Object Tracking and Segmentation	Jul 5, 2022	Multiple Object TrackingObject	CodeCode Available	4
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Dec 2, 2024	In-Context LearningVideo Segmentation	CodeCode Available	3
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation	Nov 26, 2024	Natural Language UnderstandingReferring Video Object Segmentation	CodeCode Available	3
SMITE: Segment Me In TimE	Oct 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	3
Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2	Aug 3, 2024	DiversitySegmentation	CodeCode Available	3
VISA: Reasoning Video Object Segmentation via Large Language Models	Jul 16, 2024	DecoderObject	CodeCode Available	3
Moving Object Segmentation: All You Need Is SAM (and Flow)	Apr 18, 2024	AllMotion Segmentation	CodeCode Available	3
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model	Mar 21, 2024	DecoderGeneralized Referring Expression Segmentation	CodeCode Available	3
UniVS: Unified and Universal Video Segmentation with Prompts as Queries	Feb 28, 2024	DecoderReferring Expression Segmentation	CodeCode Available	3
RAP-SAM: Towards Real-Time All-Purpose Segment Anything	Jan 18, 2024	AllDecoder	CodeCode Available	3
Putting the Object Back into Video Object Segmentation	Oct 19, 2023	ObjectSegmentation	CodeCode Available	3
Tracking Anything with Decoupled Video Segmentation	Sep 7, 2023	Open-Vocabulary Video SegmentationOpen-World Video Segmentation	CodeCode Available	3
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation	Aug 28, 2023	Instance SegmentationOptical Flow Estimation	CodeCode Available	3
Personalize Segment Anything Model with One Shot	May 4, 2023	Image Generationmodel	CodeCode Available	3
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model	Jul 14, 2022	2D Human Pose Estimation2D Object Detection	CodeCode Available	3
Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation	Mar 29, 2022	Contrastive LearningSegmentation	CodeCode Available	3
VideoMolmo: Spatio-Temporal Grounding Meets Pointing	Jun 5, 2025	Autonomous DrivingAutonomous Navigation	CodeCode Available	2
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration	May 26, 2025	Domain GeneralizationHallucination	CodeCode Available	2
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Apr 10, 2025	Contrastive LearningLanguage Modeling	CodeCode Available	2
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	Mar 5, 2025	ObjectReferring Video Object Segmentation	CodeCode Available	2
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver	Jan 1, 2025	Reasoning SegmentationSegmentation	CodeCode Available	2
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity	Dec 9, 2024	Anomaly Detectiontext annotation	CodeCode Available	2
Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2	Nov 28, 2024	Video SegmentationVideo Semantic Segmentation	CodeCode Available	2
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos	Nov 18, 2024	Pose EstimationSemantic Segmentation	CodeCode Available	2
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos	Sep 29, 2024	AllImage Segmentation	CodeCode Available	2
Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model	Sep 14, 2024	Medical Image SegmentationPolyp Segmentation	CodeCode Available	2
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation	Aug 28, 2024	ObjectSemantic Segmentation	CodeCode Available	2
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning	Aug 15, 2024	SegmentationVideo Segmentation	CodeCode Available	2
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models	May 27, 2024	SegmentationSemantic correspondence	CodeCode Available	2
LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation	Apr 30, 2024	AttributeSemantic Segmentation	CodeCode Available	2
Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation	Apr 21, 2024	Semantic SegmentationVideo Object Segmentation	CodeCode Available	2
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation	Apr 4, 2024	Contrastive LearningReferring Expression	CodeCode Available	2
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries	Mar 29, 2024	ObjectVideo Instance Segmentation	CodeCode Available	2
Efficient Video Object Segmentation via Modulated Cross-Attention Memory	Mar 26, 2024	GPUObject	CodeCode Available	2
Vivim: a Video Vision Mamba for Medical Video Segmentation	Jan 25, 2024	Lesion SegmentationMamba	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 18Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified