SOTAVerified

Image-text Retrieval

Papers

Showing 76100 of 248 papers

TitleStatusHype
MixGen: A New Multi-Modal Data AugmentationCode1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the BackboneCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
CCMB: A Large-scale Chinese Cross-modal BenchmarkCode1
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text RetrievalCode1
FILIP: Fine-grained Interactive Language-Image Pre-TrainingCode1
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsCode1
Align before Fuse: Vision and Language Representation Learning with Momentum DistillationCode1
Dynamic Modality Interaction Modeling for Image-Text RetrievalCode1
CoSMo: Content-Style Modulation for Image Retrieval With Text FeedbackCode1
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalCode1
Learning Relation Alignment for Calibrated Cross-modal RetrievalCode1
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text RetrievalCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and ReportsCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
Large-Scale Adversarial Training for Vision-and-Language Representation LearningCode1
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal TransformersCode1
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
UNITER: UNiversal Image-TExt Representation LearningCode1
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval0
Adding simple structure at inference improves Vision-Language CompositionalityCode0
Show:102550
← PrevPage 4 of 10Next →

No leaderboard results yet.