SOTAVerified

Instance Segmentation

Instance Segmentation is a computer vision task that involves identifying and separating individual objects within an image, including detecting the boundaries of each object and assigning a unique label to each object. The goal of instance segmentation is to produce a pixel-wise segmentation map of the image, where each pixel is assigned to a specific object instance.

Image Credit: Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers, CVPR'21

Papers

Showing 150 of 2262 papers

TitleStatusHype
YOLO-World: Real-Time Open-Vocabulary Object DetectionCode9
MambaOut: Do We Really Need Mamba for Vision?Code7
MambaVision: A Hybrid Mamba-Transformer Vision BackboneCode7
YOLOR-Based Multi-Task LearningCode5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
4M-21: An Any-to-Any Vision Model for Tens of Tasks and ModalitiesCode5
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment AnythingCode4
RTMDet: An Empirical Study of Designing Real-Time Object DetectorsCode4
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language ModelCode4
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense PredictionCode4
GLIPv2: Unifying Localization and Vision-Language UnderstandingCode4
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNNCode4
InstanceDiffusion: Instance-level Control for Image GenerationCode4
Visual Attention NetworkCode4
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic KernelsCode4
Detectron2 Object Detection & Manipulating Images using CartoonizationCode4
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and SegmentationCode4
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable ConvolutionsCode4
Panoptic Feature Pyramid NetworksCode4
EmbodiedSAM: Online Segment Any 3D Thing in Real TimeCode4
A Simple Framework for Open-Vocabulary Segmentation and DetectionCode3
ResNeSt: Split-Attention NetworksCode3
Vision Transformers: From Semantic Segmentation to Dense PredictionCode3
XCiT: Cross-Covariance Image TransformersCode3
Vision Transformer Adapter for Dense PredictionsCode3
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense PredictionsCode3
DETRs with Collaborative Hybrid Assignments TrainingCode3
PlainMamba: Improving Non-Hierarchical Mamba in Visual RecognitionCode3
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked ModelingCode3
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language InterfaceCode3
OneFormer: One Transformer to Rule Universal Image SegmentationCode3
No time to train! Training-Free Reference-Based Instance SegmentationCode3
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance SegmentationCode3
Universal Instance Perception as Object Discovery and RetrievalCode3
InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentationCode3
General Object Foundation Model for Images and Videos at ScaleCode3
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language AlignmentCode3
A Survey of Camouflaged Object Detection and BeyondCode3
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition TasksCode3
MTP: Advancing Remote Sensing Foundation Model via Multi-Task PretrainingCode3
Cut and Learn for Unsupervised Object Detection and Instance SegmentationCode3
Nuclei instance segmentation and classification in histopathology images with StarDistCode3
Generalized Decoding for Pixel, Image, and LanguageCode3
VideoCutLER: Surprisingly Simple Unsupervised Video Instance SegmentationCode3
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
MogaNet: Multi-order Gated Aggregation NetworkCode2
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance SegmentationCode2
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask InpaintingCode2
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural NetworksCode2
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale DatasetCode2
Show:102550
← PrevPage 1 of 46Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1InternImage-HAP5080.8Unverified
2ResNeSt-200 (multi-scale)AP5070.2Unverified
3CenterMask + VoVNetV2-99 (multi-scale)AP5066.2Unverified
4CenterMask + VoVNetV2-57 (single-scale)AP5060.8Unverified
5Co-DETRmask AP57.1Unverified
6CBNetV2 (EVA02, single-scale)mask AP56.1Unverified
7ISDA (ResNet-50)APL55.7Unverified
8EVAmask AP55.5Unverified
9FD-SwinV2-Gmask AP55.4Unverified
10Mask Frozen-DETRmask AP55.3Unverified
#ModelMetricClaimedVerifiedStatus
1InternImage-BGFLOPs501Unverified
2Co-DETRmask AP56.6Unverified
3ViT-CoMer-L (Mask RCNN, DINOv2)mask AP55.9Unverified
4InternImage-Hmask AP55.4Unverified
5EVAmask AP55Unverified
6Mask Frozen-DETRmask AP54.9Unverified
7MasK DINO (SwinL, multi-scale)mask AP54.5Unverified
8ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)mask AP54.2Unverified
9GLEE-Promask AP54.2Unverified
10SwinV2-G (HTC++)mask AP53.7Unverified