SOTAVerified

Zero-Shot Image Classification

Zero-shot image classification is a technique in computer vision where a model can classify images into categories that were not present during training. This is achieved by leveraging semantic information about the categories, such as textual descriptions or relationships between classes.

Papers

Showing 76100 of 111 papers

TitleStatusHype
Image-Caption Encoding for Improving Zero-Shot GeneralizationCode0
Segment Any Change0
CLAMP: Contrastive LAnguage Model Prompt-tuning0
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
Efficient Model-Agnostic Multi-Group Equivariant Networks0
Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
Semantically-Prompted Language Models Improve Visual Descriptions0
Learning from Children: Improving Image-Caption Pretraining via CurriculumCode0
Text-to-Image Diffusion Models are Zero-Shot ClassifiersCode0
Language-Driven Anchors for Zero-Shot Adversarial RobustnessCode0
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities0
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training0
DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot Learning0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
CLIPPO: Image-and-Language Understanding from Pixels Only0
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification0
Generative Negative Text Replay for Continual Vision-Language Pretraining0
Text2Model: Text-based Model Induction for Zero-shot Image Classification0
Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss0
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification0
PaLI: A Jointly-Scaled Multilingual Language-Image Model0
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OpenClip H/14 (34B)(Laion2B)Top-1 accuracy30.01Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP (ViT B-32)Average Score56.64Unverified
#ModelMetricClaimedVerifiedStatus
1GLIP (Tiny A)Average Score11.4Unverified