SOTAVerified

Zero-Shot Image Classification

Zero-shot image classification is a technique in computer vision where a model can classify images into categories that were not present during training. This is achieved by leveraging semantic information about the categories, such as textual descriptions or relationships between classes.

Papers

Showing 76100 of 111 papers

TitleStatusHype
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model0
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification0
Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification0
Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models0
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining0
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
Retrieval-enriched zero-shot image classification in low-resource domains0
Semantic Compositions Enhance Vision-Language Contrastive Learning0
Soundify: Matching Sound Effects to Video0
Text2Model: Text-based Model Induction for Zero-shot Image Classification0
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives0
Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based Disparities0
Visual-Semantic Embedding Model Informed by Structured Knowledge0
When are Lemons Purple? The Concept Association Bias of Vision-Language Models0
Zero-sample surface defect detection and classification based on semantic feedback neural network0
Zero-Shot Image Classification Using Coupled Dictionary Embedding0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
Text-to-Image Diffusion Models are Zero-Shot ClassifiersCode0
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language ModelsCode0
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP InversionCode0
Language-Driven Anchors for Zero-Shot Adversarial RobustnessCode0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
Do Vision-Language Foundational models show Robust Visual Perception?Code0
Image-Caption Encoding for Improving Zero-Shot GeneralizationCode0
Show:102550
← PrevPage 4 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OpenClip H/14 (34B)(Laion2B)Top-1 accuracy30.01Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP (ViT B-32)Average Score56.64Unverified
#ModelMetricClaimedVerifiedStatus
1GLIP (Tiny A)Average Score11.4Unverified