SOTAVerified

Zero-shot Text-to-Image Retrieval

Papers

Showing 115 of 15 papers

TitleStatusHype
An analysis of vision-language models for fabric retrieval0
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance0
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingCode0
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal AlignmentCode2
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Sigmoid Loss for Language Image Pre-TrainingCode3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
Chinese CLIP: Contrastive Vision-Language Pretraining in ChineseCode5
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-trainingCode0
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset0
FLAVA: A Foundational Language And Vision Alignment ModelCode1
Learning Transferable Visual Models From Natural Language SupervisionCode2
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual DescriptionsCode0
Show:102550

No leaderboard results yet.