Video Semantic Segmentation

The goal of video semantic segmentation is to assign a predefined class to each pixel in all frames of a video. This requires the model not only to predict accurate segmentation masks but also to ensure that these masks remain temporally consistent across frames. This task has broad applications in areas such as autonomous driving, medical video analysis, and AR/VR.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 895 papers

Title	Date	Tasks	Status	Hype	Score
SAM 2: Segment Anything in Images and Videos	Aug 1, 2024	Image SegmentationRobot Manipulation Generalization	CodeCode Available	12	5
Segment Anything in Medical Images and Videos: Benchmark and Deployment	Aug 6, 2024	BenchmarkingSegmentation	CodeCode Available	7	5
Efficient Track Anything	Nov 28, 2024	ObjectSegmentation	CodeCode Available	7	5
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Jan 7, 2025	2kLanguage Modeling	CodeCode Available	5	5
OMG-Seg: Is One Model Good Enough For All Segmentation?	Jan 18, 2024	AllDecoder	CodeCode Available	5	5
Underwater Camouflaged Object Tracking Meets Vision-Language SAM2	Sep 25, 2024	ObjectObject Tracking	CodeCode Available	5	5
4th PVUW MeViS 3rd Place Report: Sa2VA	Apr 1, 2025	Language ModelingLanguage Modelling	CodeCode Available	5	5
Unleashing the Potential of SAM2 for Biomedical Images and Videos: A Survey	Aug 23, 2024	Image SegmentationSegmentation	CodeCode Available	5	5
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video Segmentation	Apr 7, 2025	Inference OptimizationReferring Video Object Segmentation	CodeCode Available	5	5
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree	Oct 21, 2024	Heuristic SearchObject	CodeCode Available	4	5
MedSAM2: Segment Anything in 3D Medical Images and Videos	Apr 4, 2025	SegmentationVideo Segmentation	CodeCode Available	4	5
SegGPT: Segmenting Everything In Context	Apr 6, 2023	Few-Shot Semantic SegmentationIn-Context Learning	CodeCode Available	4	5
EdgeTAM: On-Device Track Anything Model	Jan 13, 2025	modelVideo Segmentation	CodeCode Available	4	5
SiamMask: A Framework for Fast Online Object Tracking and Segmentation	Jul 5, 2022	Multiple Object TrackingObject	CodeCode Available	4	5
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results	Jun 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	4	5
SMITE: Segment Me In TimE	Oct 24, 2024	SegmentationSemantic Segmentation	CodeCode Available	3	5
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation	Nov 26, 2024	Natural Language UnderstandingReferring Video Object Segmentation	CodeCode Available	3	5
RAP-SAM: Towards Real-Time All-Purpose Segment Anything	Jan 18, 2024	AllDecoder	CodeCode Available	3	5
Personalize Segment Anything Model with One Shot	May 4, 2023	Image Generationmodel	CodeCode Available	3	5
Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2	Aug 3, 2024	DiversitySegmentation	CodeCode Available	3	5
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model	Jul 14, 2022	2D Human Pose Estimation2D Object Detection	CodeCode Available	3	5
Min-Max Similarity: A Contrastive Semi-Supervised Deep Learning Network for Surgical Tools Segmentation	Mar 29, 2022	Contrastive LearningSegmentation	CodeCode Available	3	5
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation	Aug 28, 2023	Instance SegmentationOptical Flow Estimation	CodeCode Available	3	5
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model	Mar 21, 2024	DecoderGeneralized Referring Expression Segmentation	CodeCode Available	3	5
Putting the Object Back into Video Object Segmentation	Oct 19, 2023	ObjectSegmentation	CodeCode Available	3	5
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes	Dec 2, 2024	In-Context LearningVideo Segmentation	CodeCode Available	3	5
VISA: Reasoning Video Object Segmentation via Large Language Models	Jul 16, 2024	DecoderObject	CodeCode Available	3	5
UniVS: Unified and Universal Video Segmentation with Prompts as Queries	Feb 28, 2024	DecoderReferring Expression Segmentation	CodeCode Available	3	5
Tracking Anything with Decoupled Video Segmentation	Sep 7, 2023	Open-Vocabulary Video SegmentationOpen-World Video Segmentation	CodeCode Available	3	5
Moving Object Segmentation: All You Need Is SAM (and Flow)	Apr 18, 2024	AllMotion Segmentation	CodeCode Available	3	5
Self-Prompting Polyp Segmentation in Colonoscopy using Hybrid Yolo-SAM 2 Model	Sep 14, 2024	Medical Image SegmentationPolyp Segmentation	CodeCode Available	2	5
Audio-Visual Segmentation with Semantics	Jan 30, 2023	SegmentationSemantic Segmentation	CodeCode Available	2	5
Decoupling Features in Hierarchical Propagation for Video Object Segmentation	Oct 18, 2022	ObjectSemantic Segmentation	CodeCode Available	2	5
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning	Aug 15, 2024	SegmentationVideo Segmentation	CodeCode Available	2	5
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes	Feb 3, 2023	ObjectSegmentation	CodeCode Available	2	5
Scalable Video Object Segmentation with Identification Mechanism	Mar 22, 2022	ObjectSegmentation	CodeCode Available	2	5
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions	Aug 16, 2023	Motion Expressions Guided Video SegmentationObject	CodeCode Available	2	5
Mask2Former for Video Instance Segmentation	Dec 20, 2021	Image SegmentationInstance Segmentation	CodeCode Available	2	5
Language as Queries for Referring Video Object Segmentation	Jan 3, 2022	ObjectObject Tracking	CodeCode Available	2	5
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2	5
LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation	Apr 30, 2024	AttributeSemantic Segmentation	CodeCode Available	2	5
MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation	Sep 9, 2022	SegmentationSemantic Segmentation	CodeCode Available	2	5
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation	Jan 1, 2024	SegmentationVideo Segmentation	CodeCode Available	2	5
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration	May 26, 2025	Domain GeneralizationHallucination	CodeCode Available	2	5
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos	Nov 18, 2024	Pose EstimationSemantic Segmentation	CodeCode Available	2	5
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver	Jan 1, 2025	Reasoning SegmentationSegmentation	CodeCode Available	2	5
In Defense of Online Models for Video Instance Segmentation	Jul 21, 2022	Contrastive LearningInstance Segmentation	CodeCode Available	2	5
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	Apr 10, 2025	Contrastive LearningLanguage Modeling	CodeCode Available	2	5
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	Mar 5, 2025	ObjectReferring Video Object Segmentation	CodeCode Available	2	5
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity	Dec 9, 2024	Anomaly Detectiontext annotation	CodeCode Available	2	5

Show:10 25 50

← PrevPage 1 of 18Next →

All datasets Cityscapes val CamVid VSPW LaRS Multispectral Video Semantic Segmentation

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	mIoU	80.3	—	Unverified
2	TDNet-50 [9]	mIoU	79.9	—	Unverified
3	DeltaDist-DDRNet-39	mIoU	79.9	—	Unverified
4	PSPNet-101 [20]	mIoU	79.7	—	Unverified
5	PSPNet-50 [20]	mIoU	78.1	—	Unverified
6	LVS [12]	mIoU	76.8	—	Unverified
7	GRFP [15]	mIoU	73.6	—	Unverified
8	FCN-50 [14]	mIoU	70.1	—	Unverified
9	DFF [22]	mIoU	69.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TMANet-50	Mean IoU	76.5	—	Unverified
2	ETC-MobileNet	Mean IoU	76.3	—	Unverified
3	TDNet-50	Mean IoU	76.2	—	Unverified
4	PSPNet-50	Mean IoU	76	—	Unverified
5	Netwarp	Mean IoU	74.7	—	Unverified
6	GRFP	Mean IoU	67.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	DVIS++(VIT-L)	mIoU	63.8	—	Unverified
2	UniVS(Swin-L)	mIoU	59.8	—	Unverified
3	Tube-Link(Swin-large)	mIoU	59.6	—	Unverified
4	MRCFA(MiT-B5)	mIoU	49.9	—	Unverified
5	CFFM(MiT-B5)	mIoU	49.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaSR-T (ResNet-101)	Q	60.1	—	Unverified
2	TMANet (ResNet-50)	Q	57.5	—	Unverified
3	CSANet (ResNet-101)	Q	49.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MVNet(DeepLabV3)	mIoU	54.52	—	Unverified
2	MVNet(PSPNet)	mIoU	54.36	—	Unverified
3	MVNet(FCN)	mIoU	53.9	—	Unverified