SOTAVerified

Zero-shot Text-to-Image Retrieval

Papers

Showing 110 of 15 papers

TitleStatusHype
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
Sigmoid Loss for Language Image Pre-TrainingCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
Learning Transferable Visual Models From Natural Language SupervisionCode2
FLAVA: A Foundational Language And Vision Alignment ModelCode1
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual DescriptionsCode0
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingCode0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.