SOTAVerified

Zero-Shot Image Classification

Zero-shot image classification is a technique in computer vision where a model can classify images into categories that were not present during training. This is achieved by leveraging semantic information about the categories, such as textual descriptions or relationships between classes.

Papers

Showing 51100 of 111 papers

TitleStatusHype
KPL: Training-Free Medical Knowledge Mining of Vision-Language ModelsCode0
Text-to-Image Diffusion Models are Zero-Shot ClassifiersCode0
Segment Any ChangeCode0
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP InversionCode0
Altogether: Image Captioning via Re-aligning Alt-textCode0
Towards Difficulty-Agnostic Efficient Transfer Learning for Vision-Language ModelsCode0
PaLI: A Jointly-Scaled Multilingual Language-Image ModelCode0
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text DescribabilityCode0
Multilingual Vision-Language Pre-training for the Remote Sensing DomainCode0
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language ModelsCode0
Learning from Children: Improving Image-Caption Pretraining via CurriculumCode0
Who's in and who's out? A case study of multimodal CLIP-filtering in DataCompCode0
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkCode0
MoDE: CLIP Data Experts via ClusteringCode0
Semantically-Prompted Language Models Improve Visual Descriptions0
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations0
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision0
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene0
BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models0
Bayesian Test-Time Adaptation for Vision-Language Models0
Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation0
Bridge the Modality and Capability Gaps in Vision-Language Model Selection0
CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization0
CLAMP: Contrastive LAnguage Model Prompt-tuning0
Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image Classification0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
CoAPT: Context Attribute words for Prompt Tuning0
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features0
DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot Learning0
Efficient Model-Agnostic Multi-Group Equivariant Networks0
Efficient Multilingual Multi-modal Pre-training through Triple Contrastive Loss0
Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning0
Gaze Embeddings for Zero-Shot Image Classification0
Generative Negative Text Replay for Continual Vision-Language Pretraining0
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training0
I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification0
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification0
Improving Semantic Embedding Consistency by Metric Learning for Zero-Shot Classification0
Integrating Propositional and Relational Label Side Information for Hierarchical Zero-Shot Image Classification0
It's Not a Modality Gap: Characterizing and Addressing the Contrastive Gap0
Language to Network: Conditional Parameter Adaptation with Natural Language Descriptions0
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions0
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships0
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models0
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model0
MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification0
Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification0
Noise-Tolerant Few-Shot Unsupervised Adapter for Vision-Language Models0
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining0
RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-Training0
Show:102550
← PrevPage 2 of 3Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OpenClip H/14 (34B)(Laion2B)Top-1 accuracy30.01Unverified
#ModelMetricClaimedVerifiedStatus
1CLIP (ViT B-32)Average Score56.64Unverified
#ModelMetricClaimedVerifiedStatus
1GLIP (Tiny A)Average Score11.4Unverified