SOTAVerified

Panoptic Segmentation

Panoptic Segmentation is a computer vision task that combines semantic segmentation and instance segmentation to provide a comprehensive understanding of the scene. The goal of panoptic segmentation is to segment the image into semantically meaningful parts or regions, while also detecting and distinguishing individual instances of objects within those regions. In a given image, every pixel is assigned a semantic label, and pixels belonging to "things" classes (countable objects with instances, like cars and people) are assigned unique instance IDs. ( Image credit: Detectron2 )

Papers

Showing 150 of 462 papers

TitleStatusHype
OMG-Seg: Is One Model Good Enough For All Segmentation?Code5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
Detectron2 Object Detection & Manipulating Images using CartoonizationCode4
Scalable 3D Panoptic Segmentation As Superpoint Graph ClusteringCode4
SegGPT: Segmenting Everything In ContextCode4
Panoptic Feature Pyramid NetworksCode4
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and SegmentationCode4
Visual Attention NetworkCode4
4D Panoptic Scene Graph GenerationCode3
A Simple Framework for Open-Vocabulary Segmentation and DetectionCode3
Generalized Decoding for Pixel, Image, and LanguageCode3
ResNeSt: Split-Attention NetworksCode3
Vision Transformer Adapter for Dense PredictionsCode3
Tracking Anything with Decoupled Video SegmentationCode3
OneFormer: One Transformer to Rule Universal Image SegmentationCode3
PSALM: Pixelwise SegmentAtion with Large Multi-Modal ModelCode3
RAP-SAM: Towards Real-Time All-Purpose Segment AnythingCode3
Aligning and Prompting Everything All at Once for Universal Visual PerceptionCode2
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous ConvolutionCode2
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural CalibrationCode2
PVO: Panoptic Visual OdometryCode2
PosSAM: Panoptic Open-vocabulary Segment AnythingCode2
PEM: Prototype-based Efficient MaskFormer for Image SegmentationCode2
Per-Pixel Classification is Not All You Need for Semantic SegmentationCode2
Scalable SoftGroup for 3D Instance Segmentation on Point CloudsCode2
Open-World Entity SegmentationCode2
CellViT: Vision Transformers for Precise Cell Segmentation and ClassificationCode2
SAD: Segment Any RGBDCode2
Context-Aware Video Instance SegmentationCode2
Scene-Centric Unsupervised Panoptic SegmentationCode2
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask InpaintingCode2
Mask2Former for Video Instance SegmentationCode2
OneFormer3D: One Transformer for Unified Point Cloud SegmentationCode2
Panoptic Lifting for 3D Scene Understanding with Neural FieldsCode2
HyperSeg: Towards Universal Visual Segmentation with Large Language ModelCode2
Better Call SAL: Towards Learning to Segment Anything in LidarCode2
Image Segmentation in Foundation Model Era: A SurveyCode2
Hierarchical Multi-Scale Attention for Semantic SegmentationCode2
Hierarchical Open-vocabulary Universal Image SegmentationCode2
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language ModelCode2
BatchFormerV2: Exploring Sample Relationships for Dense Representation LearningCode2
Masked-attention Mask Transformer for Universal Image SegmentationCode2
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic SegmentationCode2
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and FutureCode2
Dilated Neighborhood Attention TransformerCode2
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion ModelsCode2
1st Place Solution for PSG competition with ECCV'22 SenseHuman WorkshopCode2
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt TuningCode2
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
Focal Modulation NetworksCode2
Show:102550
← PrevPage 1 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Mask DINO (single scale)PQ59.5Unverified
2kMaX-DeepLab (single-scale)PQ58.5Unverified
3Mask2Former (Swin-L)PQ58.3Unverified
4Panoptic SegFormer (Swin-L)PQ56.2Unverified
5Panoptic SegFormer (PVTv2-B5)PQ55.8Unverified
6CMT-DeepLab (single-scale)PQ55.7Unverified
7K-Net (Swin-L)PQ55.2Unverified
8MaskConver (ResNet50, single-scale)PQ53.6Unverified
9MaskFormer (Swin-L)PQ53.3Unverified
10Panoptic FCN* (Swin-L)PQ52.7Unverified
#ModelMetricClaimedVerifiedStatus
1HyperSeg (Swin-B)PQ61.2Unverified
2OneFormer (InternImage-H,single-scale)PQ60Unverified
3UMG-CLIP-E/14PQ59.5Unverified
4OpenSeeD (SwinL, single-scale)PQ59.5Unverified
5MasK DINO (SwinL,single-scale)PQ59.4Unverified
6EoMT (DINOv2-g, single-scale, 1280x1280)PQ59.2Unverified
7UMG-CLIP-L/14PQ58.9Unverified
8Panoptic FCN* (Swin-L, single-scale)PQth58.5Unverified
9DiNAT-L (single-scale, Mask2Former)PQ58.5Unverified
10ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)PQ58.4Unverified
#ModelMetricClaimedVerifiedStatus
1OneFormer (DiNAT-L, single-scale)PQ46.7Unverified
2OneFormer (ConvNeXt-L, single-scale)PQ46.4Unverified
3Panoptic FCN* (Swin-L, single-scale)PQ45.7Unverified
4Panoptic-DeepLab (SWideRNet-(1, 1, 4.5), multi-scale)PQ44.8Unverified
5Panoptic FCN* (ResNet-50-FPN)PQst42.3Unverified
6Mask2Former + Intra-Batch Supervision (ResNet-50)PQ42.2Unverified
7Axial-DeepLab-L (multi-scale)PQ41.1Unverified
8EfficientPSPQ40.6Unverified
9Panoptic-DeepLab (X71)PQ40.5Unverified
10AdaptIS (ResNeXt-101)PQ40.3Unverified
#ModelMetricClaimedVerifiedStatus
1OneFormer (ConvNeXt-L, single-scale, Mapillary Vistas-Pretrained)PQ68Unverified
2Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary, multi-scale)PQ67.8Unverified
3EfficientPSPQ67.1Unverified
4Axial-DeepLab-XL (Mapillary Vistas, multi-scale)PQ66.6Unverified
5kMaX-DeepLab (single-scale)PQ66.2Unverified
6Panoptic-DeeplabPQ65.5Unverified
7EfficientPS (Cityscapes-fine)PQ62.9Unverified
8COPS (ResNet-50)PQ60Unverified
9SOGNet (ResNet-50)PQ60Unverified
10Dynamically Instantiated NetworkPQ55.4Unverified
#ModelMetricClaimedVerifiedStatus
1Mask2Former (Swin-B)PQ41.7Unverified
2Panoptic FPN (ResNet-50)PQ40.1Unverified
3Mask2Former (Swin-T)PQ39.2Unverified
4Panoptic FPN (ResNet-101)PQ38.7Unverified
5Mask2Former (ResNet-50)PQ37.6Unverified
6Mask2Former (ResNet-101)PQ37.2Unverified
7Panoptic Deeplab (ResNet-50)PQ34.7Unverified
8MaX-DeepLabPQ31.9Unverified
#ModelMetricClaimedVerifiedStatus
1SuperClusterPQ50.1Unverified
2PointGroup (Xiang 2023)PQ42.3Unverified
3KPConv (Xiang 2023)PQ41.8Unverified
4MinkowskiNet (Xiang 2023)PQ39.2Unverified
5PointNet++ (Xiang 2023)PQ24.6Unverified
#ModelMetricClaimedVerifiedStatus
1OneFormer3DPQ71.2Unverified
2PanopticNDT (10cm)PQ59.19Unverified
3SuperClusterPQ58.7Unverified
4PanopticFusion (with CRF)PQ33.5Unverified
5SceneGraphFusion (NN mapping)PQ31.5Unverified
#ModelMetricClaimedVerifiedStatus
1EfficientPSPQ51.1Unverified
2SeamlessPQ48.5Unverified
3UPSNetPQ47.1Unverified
4Panoptic FPNPQ46.7Unverified
#ModelMetricClaimedVerifiedStatus
1EfficientPSPQ43.7Unverified
2SeamlessPQ42.2Unverified
3UPSNetPQ39.9Unverified
4Panoptic FPNPQ39.3Unverified
#ModelMetricClaimedVerifiedStatus
1LKCellPQ50.8Unverified
2CellViT-SAM-HPQ50.62Unverified
3TSFDPQ50.4Unverified
4NuLite-HPQ49.81Unverified
#ModelMetricClaimedVerifiedStatus
1OneFormer3DPQ71.2Unverified
2SuperClusterPQ58.7Unverified
3PanopticFusionPQ33.5Unverified
4SceneGraphFusionPQ31.5Unverified
#ModelMetricClaimedVerifiedStatus
1Exchanger+Mask2FormerPQ52.6Unverified
2Exchanger+Unet+PaPsPQ47.8Unverified
3U-TAE + PaPsPQ40.4Unverified
#ModelMetricClaimedVerifiedStatus
1VAN-B6*PQ58.2Unverified
2PFPN (ideal number of groups)PQ42.15Unverified
#ModelMetricClaimedVerifiedStatus
1CAFuser (Swin-T)PQ59.7Unverified
2MUSES (Mask2Former /w 4xSwin-T)PQ53.6Unverified
#ModelMetricClaimedVerifiedStatus
1EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned)PQ51.15Unverified
2EMSANetPQ47.38Unverified
#ModelMetricClaimedVerifiedStatus
1P3FormerPQ0.65Unverified
2DS-NetPQ0.56Unverified
#ModelMetricClaimedVerifiedStatus
1MasQCLIPPQ23.3Unverified