SOTAVerified

Zero-Shot Image Classification

Zero-shot image classification is a technique in computer vision where a model can classify images into categories that were not present during training. This is achieved by leveraging semantic information about the categories, such as textual descriptions or relationships between classes.

Papers

Showing 51100 of 111 papers

TitleStatusHype
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense PredictionCode2
Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
PerceptionCLIP: Visual Classification by Inferring and Conditioning on ContextsCode1
PromptStyler: Prompt-driven Style Generation for Source-free Domain GeneralizationCode1
Distilling Large Vision-Language Model with Out-of-Distribution GeneralizabilityCode1
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Semantically-Prompted Language Models Improve Visual Descriptions0
Learning from Children: Improving Image-Caption Pretraining via CurriculumCode0
CamDiff: Camouflage Image Augmentation via Diffusion ModelCode1
Text-to-Image Diffusion Models are Zero-Shot ClassifiersCode0
Structure Pretraining and Prompt Tuning for Knowledge Graph TransferCode1
CHiLS: Zero-Shot Image Classification with Hierarchical Label SetsCode1
Language-Driven Anchors for Zero-Shot Adversarial RobustnessCode0
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities0
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse RetrievalCode1
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training0
DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot Learning0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
CLIPPO: Image-and-Language Understanding from Pixels OnlyCode0
Reproducible scaling laws for contrastive language-image learningCode1
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification0
AltCLIP: Altering the Language Encoder in CLIP for Extended Language CapabilitiesCode4
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
Generative Negative Text Replay for Continual Vision-Language Pretraining0
Text2Model: Text-based Model Induction for Zero-shot Image Classification0
General Image Descriptors for Open World Image Retrieval using ViT CLIPCode1
Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss0
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification0
PaLI: A Jointly-Scaled Multilingual Language-Image ModelCode0
What does a platypus look like? Generating customized prompts for zero-shot image classificationCode2
Zero-Shot Temporal Action Detection via Vision-Language PromptingCode1
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot LearningCode1
Disentangled Ontology Embedding for Zero-shot LearningCode1
Masked Unsupervised Self-training for Label-free Image ClassificationCode1
CCMB: A Large-scale Chinese Cross-modal BenchmarkCode1
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining0
Zero-Shot Logit AdjustmentCode1
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual ModelsCode4
Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image ClassificationCode1
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language ModelCode1
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision0
Soundify: Matching Sound Effects to Video0
LiT: Zero-Shot Transfer with Locked-image text TuningCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Benchmarking Knowledge-driven Zero-shot LearningCode1
Zero-sample surface defect detection and classification based on semantic feedback neural network0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OpenClip H/14 (34B)(Laion2B)Top-1 accuracy30.01Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP (ViT B-32)Average Score56.64Unverified
#ModelMetricClaimedVerifiedStatus
1GLIP (Tiny A)Average Score11.4Unverified