SOTAVerified

Image-text Retrieval

Papers

Showing 151175 of 248 papers

TitleStatusHype
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction0
Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples0
Embracing Language Inclusivity and Diversity in CLIP through Continual Language LearningCode0
Enhancing Image-Text Matching with Adaptive Feature AggregationCode0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data0
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models0
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers0
A New Fine-grained Alignment Method for Image-text Matching0
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval0
Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
Constructing Image-Text Pair Dataset from Books0
Dual Relation Alignment for Composed Image Retrieval0
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrievalCode0
Contrastive Feature Masking Open-Vocabulary Vision Transformer0
DLIP: Distilling Language-Image Pre-training0
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks0
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP0
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
Integrating Listwise Ranking into Pairwise-based Image-Text RetrievalCode0
Hypernymization of named entity-rich captions for grounding-based multi-modal pretraining0
RECLIP: Resource-efficient CLIP by Training with Small Images0
Show:102550
← PrevPage 7 of 10Next →

No leaderboard results yet.