SOTAVerified

Zero-Shot Image Classification

Zero-shot image classification is a technique in computer vision where a model can classify images into categories that were not present during training. This is achieved by leveraging semantic information about the categories, such as textual descriptions or relationships between classes.

Papers

Showing 110 of 111 papers

TitleStatusHype
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual ModelsCode4
AltCLIP: Altering the Language Encoder in CLIP for Extended Language CapabilitiesCode4
PromptKD: Unsupervised Prompt Distillation for Vision-Language ModelsCode3
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingCode2
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent CollaborationCode2
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality InversionCode2
Mitigate the Gap: Investigating Approaches for Improving Cross-Modal Alignment in CLIPCode2
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Show:102550
← PrevPage 1 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OpenClip H/14 (34B)(Laion2B)Top-1 accuracy30.01Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP (ViT B-32)Average Score56.64Unverified
#ModelMetricClaimedVerifiedStatus
1GLIP (Tiny A)Average Score11.4Unverified