SOTAVerified

Zero-Shot Transfer Image Classification

Papers

Showing 110 of 19 papers

TitleStatusHype
EVA-CLIP-18B: Scaling CLIP to 18 Billion ParametersCode0
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingCode0
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
Distilling Large Vision-Language Model with Out-of-Distribution GeneralizabilityCode1
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Your Diffusion Model is Secretly a Zero-Shot ClassifierCode2
EVA-CLIP: Improved Training Techniques for CLIP at ScaleCode1
The effectiveness of MAE pre-pretraining for billion-scale pretrainingCode1
Scaling Vision Transformers to 22 Billion ParametersCode0
Learning Customized Visual Models with Retrieval-Augmented KnowledgeCode1
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1M2-EncoderAccuracy (Private)88.5Unverified
2BASIC (Lion)Accuracy (Private)88.3Unverified
3CoCaAccuracy (Private)86.3Unverified
4LiT-22BAccuracy (Private)85.9Unverified
5BASICAccuracy (Private)85.7Unverified
6LiT ViT-eAccuracy (Private)85.4Unverified
7LiT-tuningAccuracy (Private)84.5Unverified
8IMP-MoE-LAccuracy (Private)83.9Unverified
9EVA-CLIP-18BAccuracy (Private)83.8Unverified
10InternVL-CAccuracy (Private)83.2Unverified