SOTAVerified

Image-text Retrieval

Papers

Showing 2650 of 248 papers

TitleStatusHype
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
FlexiViT: One Model for All Patch SizesCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Image-text Retrieval via Preserving Main Semantics of VisionCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
Equivariant Similarity for Vision-Language Foundation ModelsCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense CaptionerCode1
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.