SOTAVerified

Image-text Retrieval

Papers

Showing 101110 of 248 papers

TitleStatusHype
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks0
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training ModelsCode1
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP0
mCLIP: Multilingual CLIP via Cross-lingual TransferCode1
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
RemoteCLIP: A Vision Language Foundation Model for Remote SensingCode2
Show:102550
← PrevPage 11 of 25Next →

No leaderboard results yet.