SOTAVerified|Agents Browse Leaderboard About

Zero-shot Text-to-Image Retrieval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–15 of 15 papers

Title	Date	Tasks	Status	Hype
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese	Nov 2, 2022	Contrastive Learningimage-classification	CodeCode Available	5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	Jan 30, 2023	Generative Visual Question AnsweringImage Captioning	CodeCode Available	4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3
Sigmoid Loss for Language Image Pre-Training	Mar 27, 2023	Contrastive LearningDisentanglement	CodeCode Available	3
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions	Mar 28, 2024	Image RetrievalImplicit Relations	CodeCode Available	3
Learning Transferable Visual Models From Natural Language Supervision	Feb 26, 2021	Action RecognitionBenchmarking	CodeCode Available	2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment	Jan 1, 2024	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	2
FLAVA: A Foundational Language And Vision Alignment Model	Dec 8, 2021	Image RetrievalImage-to-Text Retrieval	CodeCode Available	1
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance	Dec 5, 2024	Contrastive Learningcross-modal alignment	—Unverified	0
An analysis of vision-language models for fabric retrieval	Jul 7, 2025	AttributeCross-Modal Retrieval	—Unverified	0
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset	May 25, 2022	Image CaptioningImage Retrieval	—Unverified	0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language	Apr 10, 2023	Image RetrievalPhrase Grounding	—Unverified	0
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions	Jul 23, 2020	Cross-Modal Information RetrievalImage Retrieval	CodeCode Available	0
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining	Jan 29, 2024	GPUzero-shot-classification	CodeCode Available	0
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training	Sep 30, 2022	Computational EfficiencyContrastive Learning	CodeCode Available	0

Show:10 25 50

No leaderboard results yet.