SOTAVerified

Image-text Retrieval

Papers

Showing 76100 of 248 papers

TitleStatusHype
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
Learnable Pillar-based Re-ranking for Image-Text RetrievalCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
Learning Relation Alignment for Calibrated Cross-modal RetrievalCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Multimodal Federated Learning via Contrastive Representation EnsembleCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text RetrievalCode1
FETA: Towards Specializing Foundation Models for Expert Task ApplicationsCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
I0T: Embedding Standardization Method Towards Zero Modality GapCode1
FlexiViT: One Model for All Patch SizesCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
Image-text Retrieval via Preserving Main Semantics of VisionCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse RetrievalCode1
MLLMs-Augmented Visual-Language Representation LearningCode1
VladVA: Discriminative Fine-tuning of LVLMs0
Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval0
Show:102550
← PrevPage 4 of 10Next →

No leaderboard results yet.