Image-text Retrieval

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 248 papers

Title	Date	Tasks	Status	Hype
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation	Jan 28, 2022	Image CaptioningImage-text matching	CodeCode Available	5
FG-CLIP: Fine-Grained Visual and Textual Alignment	May 8, 2025	Image-text Retrievalobject-detection	CodeCode Available	4
Multi-label Cluster Discrimination for Visual Representation Learning	Jul 24, 2024	Contrastive LearningImage-text Retrieval	CodeCode Available	4
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding	Feb 9, 2025	Image CaptioningImage-text Retrieval	CodeCode Available	3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation	Apr 4, 2023	Cross-Modal RetrievalImage-text Retrieval	CodeCode Available	3
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends	Oct 17, 2022	Few-Shot LearningImage Captioning	CodeCode Available	3
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models	Mar 31, 2024	Image-text RetrievalLanguage Modeling	CodeCode Available	3
Towards Vision-Language Geo-Foundation Model: A Survey	Jun 13, 2024	Earth ObservationImage Captioning	CodeCode Available	2
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval	Mar 8, 2024	Image-text RetrievalRetrieval	CodeCode Available	2
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision	Feb 11, 2021	Cross-Modal RetrievalFine-Grained Image Classification	CodeCode Available	2
RWKV-CLIP: A Robust Vision-Language Representation Learner	Jun 11, 2024	Image-text RetrievalRepresentation Learning	CodeCode Available	2
Vision-Language Pre-Training with Triple Contrastive Learning	Feb 21, 2022	Contrastive Learningcross-modal alignment	CodeCode Available	2
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text	Oct 18, 2022	Contrastive LearningImage-text Retrieval	CodeCode Available	2
PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents	Mar 13, 2023	image-classificationImage Classification	CodeCode Available	2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Jun 10, 2025	Image-text RetrievalQuestion Answering	CodeCode Available	2
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature	Jan 13, 2025	ArticlesImage-text Retrieval	CodeCode Available	2
VeCLIP: Improving CLIP Training via Visual-enriched Captions	Oct 11, 2023	Image-text RetrievalRetrieval	CodeCode Available	2
DreamLIP: Language-Image Pre-training with Long Captions	Mar 25, 2024	Contrastive LearningImage-text Retrieval	CodeCode Available	2
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis	Mar 25, 2025	Contrastive LearningImage-text Retrieval	CodeCode Available	2
Frozen Transformers in Language Models Are Effective Visual Encoder Layers	Oct 19, 2023	Action RecognitionImage-text Retrieval	CodeCode Available	2
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing	Jun 19, 2023	ClassificationCross-Modal Retrieval	CodeCode Available	2
Accelerating Transformers with Spectrum-Preserving Token Merging	May 25, 2024	image-classificationImage Classification	CodeCode Available	2
Cross-lingual and Multilingual CLIP	Jun 1, 2022	Contrastive LearningImage-text Retrieval	CodeCode Available	2
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning	Mar 2, 2021	BIG-bench Machine LearningImage Retrieval	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 10Next →

No leaderboard results yet.