SOTAVerified

Image-text Retrieval

Papers

Showing 51100 of 248 papers

TitleStatusHype
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective0
Object-Aware Query Perturbation for Cross-Modal Image-Text RetrievalCode0
UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal MatchingCode1
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging0
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?0
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive LearningCode0
Composing Object Relations and Attributes for Image-Text MatchingCode1
Towards Vision-Language Geo-Foundation Model: A SurveyCode2
RWKV-CLIP: A Robust Vision-Language Representation LearnerCode2
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval0
Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training0
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text RetrievalCode1
Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships0
Accelerating Transformers with Spectrum-Preserving Token MergingCode2
Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples0
PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation LearningCode1
Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval0
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation0
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement0
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
DreamLIP: Language-Image Pre-training with Long CaptionsCode2
Eye-gaze Guided Multi-modal Alignment for Medical Representation LearningCode1
Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction0
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival0
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text RetrievalCode2
Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples0
Embracing Language Inclusivity and Diversity in CLIP through Continual Language LearningCode0
Enhancing Image-Text Matching with Adaptive Feature AggregationCode0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
Filter & Align: Leveraging Human Knowledge to Curate Image-Text Data0
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models0
MLLMs-Augmented Visual-Language Representation LearningCode1
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers0
A New Fine-grained Alignment Method for Image-text Matching0
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval0
A Prior Instruction Representation Framework for Remote Sensing Image-text RetrievalCode1
Frozen Transformers in Language Models Are Effective Visual Encoder LayersCode2
Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval0
Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning0
VeCLIP: Improving CLIP Training via Visual-enriched CaptionsCode2
ESA: External Space Attention Aggregation for Image-Text RetrievalCode1
Constructing Image-Text Pair Dataset from Books0
Dual Relation Alignment for Composed Image Retrieval0
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrievalCode0
Contrastive Feature Masking Open-Vocabulary Vision Transformer0
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained AlignmentCode1
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text RetrievalCode1
DLIP: Distilling Language-Image Pre-training0
Show:102550
← PrevPage 2 of 5Next →

No leaderboard results yet.