SOTAVerified

Video Instance Segmentation

The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.

To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.

Papers

Showing 150 of 148 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Segment Anything Meets Point TrackingCode3
UniVS: Unified and Universal Video Segmentation with Prompts as QueriesCode3
VideoCutLER: Surprisingly Simple Unsupervised Video Instance SegmentationCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based TrainingCode2
Language as Queries for Referring Video Object SegmentationCode2
Revisiting Contrastive Methods for Unsupervised Learning of Visual RepresentationsCode2
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor QueriesCode2
Mask-Free Video Instance SegmentationCode2
Occlusion-Aware Instance Segmentation via BiLayer Network ArchitecturesCode2
Temporally Efficient Vision Transformer for Video Instance SegmentationCode2
Mask2Former for Video Instance SegmentationCode2
In Defense of Online Models for Video Instance SegmentationCode2
Video Instance SegmentationCode2
Context-Aware Video Instance SegmentationCode2
Simple Online and Realtime Tracking with a Deep Association MetricCode1
SG-Net: Spatial Granularity Network for One-Stage Video Instance SegmentationCode1
SipMask: Spatial Information Preservation for Fast Image and Video Instance SegmentationCode1
CTVIS: Consistent Training for Online Video Instance SegmentationCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place SolutionCode1
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging VideosCode1
RankSeg: Adaptive Pixel Classification with Image Category Ranking for SegmentationCode1
D2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in VideosCode1
MouseSIS: A Frames-and-Events Dataset for Space-Time Instance Segmentation of MiceCode1
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance SegmentationCode1
Occluded Video Instance Segmentation: A BenchmarkCode1
DeVIS: Making Deformable Transformers Work for Video Instance SegmentationCode1
Real-time Human-Centric Segmentation for Complex Video ScenesCode1
Context-Aware Relative Object Queries To Unify Video Instance and Panoptic SegmentationCode1
Prototypical Cross-Attention Networks for Multiple Object Tracking and SegmentationCode1
Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal ConsistencyCode1
Instance-wise Depth and Motion Learning from Monocular VideosCode1
Instances as QueriesCode1
Instance As Identity: A Generic Online Paradigm for Video Instance SegmentationCode1
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance SegmentationCode1
CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance SegmentationCode1
CompFeat: Comprehensive Feature Aggregation for Video Instance SegmentationCode1
End-to-End Video Instance Segmentation with TransformersCode1
A Generalized Framework for Video Instance SegmentationCode1
DVIS++: Improved Decoupled Framework for Universal Video SegmentationCode1
Improving Video Instance Segmentation via Temporal Pyramid RoutingCode1
Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection ConsistencyCode1
DVIS: Decoupled Video Instance Segmentation FrameworkCode1
1st Place Solution for the 5th LSVOS Challenge: Video Instance SegmentationCode1
Learning Dynamic Query Combinations for Transformer-based Object Detection and SegmentationCode1
Implicit Feature Refinement for Instance SegmentationCode1
Do Different Tracking Tasks Require Different Appearance Models?Code1
Show:102550
← PrevPage 1 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1DVIS-DAQ(VIT-L, Offline)mask AP57.1Unverified
2CAVIS(VIT-L, Offline)mask AP57.1Unverified
3DVIS++(VIT-L,Offline)mask AP53.4Unverified
4GLEE-Promask AP50.4Unverified
5DVIS(Swin-L, Offline)mask AP49.9Unverified
6DVIS++(VIT-L, Online)mask AP49.6Unverified
7UNINEXT (ViT-H, Online)mask AP49Unverified
8DVIS(Swin-L, Online)mask AP47.1Unverified
9CTVIS (Swin-L)mask AP46.9Unverified
10RefineVIS (Swin-L, offline)mask AP46Unverified
#ModelMetricClaimedVerifiedStatus
1CAVIS(ViT-L, Online)mask AP68.9Unverified
2DVIS++(ViT-L, Online)mask AP67.7Unverified
3DVISmask AP64.9Unverified
4Tube-Linkmask AP64.6Unverified
5MinVIS (Swin-L)mask AP61.6Unverified
6Mask2Former (Swin-L)mask AP60.4Unverified
7UniVS(Swin-L)mask AP60Unverified
8MDQE(Swin-L)mask AP59.9Unverified
9SeqFormer (Swin-L)mask AP59.3Unverified
10DeVIS (Swin-L)mask AP57.1Unverified
#ModelMetricClaimedVerifiedStatus
1CAVIS(VIT-L, Offline)mask AP65.3Unverified
2DVIS-DAQ(VIT-L, Offline)mask AP64.5Unverified
3DVIS++(VIT-L, Offline)mask AP63.9Unverified
4DVIS++(VIT-L, Online)mask AP62.3Unverified
5RefineVIS (Swin-L, online)mask AP61.4Unverified
6GRAtt-VIS (Swin-L)mask AP60.3Unverified
7TarViS (Swin-L)mask AP60.2Unverified
8GenVIS (Swin-L)mask AP60.1Unverified
9DVIS(Swin-L)mask AP60.1Unverified
10NOVIS (Swin-L)mask AP59.8Unverified
#ModelMetricClaimedVerifiedStatus
1DVIS++(VIT-L)mAP_L50.9Unverified
2CAVIS (VIT-L)mAP_L48.6Unverified
3CTVIS (Swin-L)mAP_L46.4Unverified
4DVIS(Swin-L)mAP_L45.9Unverified
5CTVIS (ResNet-50)mAP_L39.4Unverified
6InstanceFormer (Swin)mAP_L26.3Unverified
7InstanceFormer (Resnet-50)mAP_L24.8Unverified
#ModelMetricClaimedVerifiedStatus
1PCANmMOTSA27.4Unverified
2QDTrack-mots-fixmMOTSA23.5Unverified
3QDTrack-motsmMOTSA22.5Unverified
4MaskTrackRCNNmMOTSA12.3Unverified
5STEm-SegmMOTSA12.2Unverified
6SortIoUmMOTSA10.3Unverified
#ModelMetricClaimedVerifiedStatus
1VMT (Swin-L)Tube-Boundary AP44.8Unverified
2SeqFormer (Swin-L)Tube-Boundary AP43.3Unverified
3VMT (R101)Tube-Boundary AP32.5Unverified
4VMT (R50)Tube-Boundary AP30.7Unverified
#ModelMetricClaimedVerifiedStatus
1Temporal ROI Alignmask AP38Unverified
#ModelMetricClaimedVerifiedStatus
1MaskFreeVISAP55.3Unverified