Image-text Retrieval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 248 papers

Title	Date	Tasks	Status	Hype	Score
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation	Jan 28, 2022	Image CaptioningImage-text matching	CodeCode Available	5	5
Multi-label Cluster Discrimination for Visual Representation Learning	Jul 24, 2024	Contrastive LearningImage-text Retrieval	CodeCode Available	4	5
FG-CLIP: Fine-Grained Visual and Textual Alignment	May 8, 2025	Image-text Retrievalobject-detection	CodeCode Available	4	5
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models	Mar 31, 2024	Image-text RetrievalLanguage Modeling	CodeCode Available	3	5
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends	Oct 17, 2022	Few-Shot LearningImage Captioning	CodeCode Available	3	5
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Feb 9, 2025	Image CaptioningImage-text Retrieval	CodeCode Available	3	5
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3	5
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation	Apr 4, 2023	Cross-Modal RetrievalImage-text Retrieval	CodeCode Available	3	5
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Jun 10, 2025	Image-text RetrievalQuestion Answering	CodeCode Available	2	5
VeCLIP: Improving CLIP Training via Visual-enriched Captions	Oct 11, 2023	Image-text RetrievalRetrieval	CodeCode Available	2	5
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval	Mar 8, 2024	Image-text RetrievalRetrieval	CodeCode Available	2	5
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis	Mar 25, 2025	Contrastive LearningImage-text Retrieval	CodeCode Available	2	5
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning	Mar 2, 2021	BIG-bench Machine LearningImage Retrieval	CodeCode Available	2	5
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature	Jan 13, 2025	ArticlesImage-text Retrieval	CodeCode Available	2	5
RWKV-CLIP: A Robust Vision-Language Representation Learner	Jun 11, 2024	Image-text RetrievalRepresentation Learning	CodeCode Available	2	5
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision	Feb 11, 2021	Cross-Modal RetrievalFine-Grained Image Classification	CodeCode Available	2	5
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents	Mar 13, 2023	image-classificationImage Classification	CodeCode Available	2	5
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	Jun 19, 2023	ClassificationCross-Modal Retrieval	CodeCode Available	2	5
Vision-Language Pre-Training with Triple Contrastive Learning	Feb 21, 2022	Contrastive Learningcross-modal alignment	CodeCode Available	2	5
Accelerating Transformers with Spectrum-Preserving Token Merging	May 25, 2024	image-classificationImage Classification	CodeCode Available	2	5
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text	Oct 18, 2022	Contrastive LearningImage-text Retrieval	CodeCode Available	2	5
Frozen Transformers in Language Models Are Effective Visual Encoder Layers	Oct 19, 2023	Action RecognitionImage-text Retrieval	CodeCode Available	2	5
Cross-lingual and Multilingual CLIP	Jun 1, 2022	Contrastive LearningImage-text Retrieval	CodeCode Available	2	5
Towards Vision-Language Geo-Foundation Model: A Survey	Jun 13, 2024	Earth ObservationImage Captioning	CodeCode Available	2	5
DreamLIP: Language-Image Pre-training with Long Captions	Mar 25, 2024	Contrastive LearningImage-text Retrieval	CodeCode Available	2	5
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping	Apr 26, 2023	DecoderImage Captioning	CodeCode Available	1	5
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval	Jun 4, 2021	Graph MatchingImage Retrieval	CodeCode Available	1	5
Large-Scale Adversarial Training for Vision-and-Language Representation Learning	Jun 11, 2020	Image-text RetrievalQuestion Answering	CodeCode Available	1	5
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption	Aug 16, 2023	Action ClassificationImage-text Retrieval	CodeCode Available	1	5
FlexiViT: One Model for All Patch Sizes	Dec 15, 2022	AllImage-text Retrieval	CodeCode Available	1	5
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations	Jun 14, 2023	image-classificationImage Classification	CodeCode Available	1	5
Image-text Retrieval via Preserving Main Semantics of Vision	Apr 20, 2023	Cross-Modal RetrievalImage-text Retrieval	CodeCode Available	1	5
FILIP: Fine-grained Interactive Language-Image Pre-Training	Nov 9, 2021	image-classificationImage Classification	CodeCode Available	1	5
FETA: Towards Specializing Foundation Models for Expert Task Applications	Sep 8, 2022	Domain GeneralizationFew-Shot Learning	CodeCode Available	1	5
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval	Mar 8, 2020	Cross-Modal RetrievalImage-text Retrieval	CodeCode Available	1	5
Learnable Pillar-based Re-ranking for Image-Text Retrieval	Apr 25, 2023	Image-text RetrievalRe-Ranking	CodeCode Available	1	5
Equivariant Similarity for Vision-Language Foundation Models	Mar 25, 2023	Image-text RetrievalRetrieval	CodeCode Available	1	5
ESA: External Space Attention Aggregation for Image-Text Retrieval	Oct 10, 2023	Image-text RetrievalRetrieval	CodeCode Available	1	5
Graph Optimal Transport for Cross-Domain Alignment	Jun 26, 2020	Graph MatchingImage Captioning	CodeCode Available	1	5
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner	May 19, 2023	Dense CaptioningImage Captioning	CodeCode Available	1	5
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment	Aug 29, 2022	cross-modal alignmentImage-text Retrieval	CodeCode Available	1	5
Global and Local Semantic Completion Learning for Vision-Language Pre-training	Jun 12, 2023	cross-modal alignmentImage-text Retrieval	CodeCode Available	1	5
Dynamic Modality Interaction Modeling for Image-Text Retrieval	Jul 11, 2021	cross-modal alignmentCross-Modal Retrieval	CodeCode Available	1	5
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding	Jun 15, 2023	Contrastive Learningimage-classification	CodeCode Available	1	5
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation	Jul 16, 2021	Cross-Modal RetrievalGrounded language learning	CodeCode Available	1	5
CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback	Jun 19, 2021	Image RetrievalImage-text Retrieval	CodeCode Available	1	5
A Survey of Medical Vision-and-Language Applications and Their Techniques	Nov 19, 2024	Decision MakingDiagnostic	CodeCode Available	1	5
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training	Jun 15, 2023	Image-text RetrievalRepresentation Learning	CodeCode Available	1	5
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers	May 27, 2023	Image CaptioningImage Retrieval	CodeCode Available	1	5
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition	Jan 1, 2021	Image-text RetrievalMedical Image Analysis	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 5Next →

No leaderboard results yet.