SOTAVerified

Image-text Retrieval

Papers

Showing 5175 of 248 papers

TitleStatusHype
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text RetrievalCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
Hyperbolic Image-Text RepresentationsCode1
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text RetrievalCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
I0T: Embedding Standardization Method Towards Zero Modality GapCode1
ComCLIP: Training-Free Compositional Image and Text MatchingCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
FlexiViT: One Model for All Patch SizesCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.