SOTAVerified

Image-text Retrieval

Papers

Showing 2650 of 248 papers

TitleStatusHype
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence LearningCode1
I0T: Embedding Standardization Method Towards Zero Modality GapCode1
A Survey of Medical Vision-and-Language Applications and Their TechniquesCode1
Nearest Neighbor Normalization Improves Multimodal RetrievalCode1
PC^2: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal RetrievalCode1
UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal MatchingCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text RetrievalCode1
PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation LearningCode1
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
MLLMs-Augmented Visual-Language Representation LearningCode1
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained AlignmentCode1
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text RetrievalCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training ModelsCode1
mCLIP: Multilingual CLIP via Cross-lingual TransferCode1
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional UnderstandingCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.