SOTAVerified

Zero-shot Text-to-Image Retrieval

Papers

Showing 115 of 15 papers

TitleStatusHype
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Sigmoid Loss for Language Image Pre-TrainingCode3
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
Learning Transferable Visual Models From Natural Language SupervisionCode2
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
FLAVA: A Foundational Language And Vision Alignment ModelCode1
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
An analysis of vision-language models for fabric retrieval0
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual DescriptionsCode0
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingCode0
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-trainingCode0
Show:102550

No leaderboard results yet.