Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 Are Diffusion Models Vision-And-Language Reasoners? May 25, 2023 Denoising Image Generation
Code Code Available 15 Learning with Noisy Correspondence for Cross-modal Matching Dec 1, 2021 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 15 LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation May 18, 2023 Attribute Image Generation
Code Code Available 15 Adaptive Offline Quintuplet Loss for Image-Text Matching Mar 7, 2020 Image-text matching Text Matching
Code Code Available 15 MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 15 BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency Mar 22, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 15 Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 15 RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training Mar 15, 2024 Diagnostic image-classification
Code Code Available 15 ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning Feb 27, 2025 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 15 BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding Feb 25, 2023 Brain Decoding Image Generation
Code Code Available 15 ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO Apr 7, 2022 Image-text matching Text Matching
Code Code Available 15 Self-supervised vision-language pretraining for Medical visual question answering Nov 24, 2022 Contrastive Learning Image-text matching
Code Code Available 15 Similarity Reasoning and Filtration for Image-Text Matching Jan 5, 2021 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 Stacked Cross Attention for Image-Text Matching Mar 21, 2018 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations May 6, 2023 Image-text matching Text Matching
Code Code Available 15 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 15 Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Nov 30, 2023 Attribute Compositional Zero-Shot Learning
Code Code Available 15 Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 15 Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark Jun 5, 2023 Attribute Image-text matching
Code Code Available 15 Transformer Reasoning Network for Image-Text Matching and Retrieval Apr 20, 2020 Image Retrieval Image-text matching
Code Code Available 15 CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 15 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 15 UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching Jul 11, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 15 Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 15 CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 15 Graph Structured Network for Image-Text Matching Apr 1, 2020 Attribute Cross-Modal Retrieval
Code Code Available 15 Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? Oct 21, 2022 Image-text matching Language Modeling
Code Code Available 05 Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking Jan 29, 2024 Image-text matching Text Matching
Code Code Available 05 Learning Two-Branch Neural Networks for Image-Text Matching Tasks Apr 11, 2017 Image-text matching Retrieval
Code Code Available 05 Integrating Language Guidance Into Image-Text Matching for Correcting False Negatives Mar 24, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 05 Learning fragment self-attention embeddings for image-text matching Oct 1, 2019 Image-text matching Sentence
Code Code Available 05 Dual Attention Networks for Multimodal Reasoning and Matching Nov 2, 2016 Collaborative Inference Image-text matching
Code Code Available 05 Position Focused Attention Network for Image-Text Matching Jul 23, 2019 Image-text matching Position
Code Code Available 05 RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models Apr 21, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 05 Generative Visual Instruction Tuning Jun 17, 2024 Image Generation Image-text matching
Code Code Available 05 Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP Jan 1, 2024 Backdoor Attack Contrastive Learning
Code Code Available 05 MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval May 18, 2023 Image-text matching Retrieval
Code Code Available 05 GR-GAN: Gradual Refinement Text-to-image Generation May 23, 2022 Generative Adversarial Network Image Generation
Code Code Available 05 Deep Cross-Modal Projection Learning for Image-Text Matching Sep 1, 2018 Cross-Modal Retrieval Image-text matching
Code Code Available 05 Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jun 18, 2024 Image-text matching Language Modeling
Code Code Available 05 Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks Sep 14, 2023 Image-text matching Sarcasm Detection
Code Code Available 05 Increasing Textual Context Size Boosts Medical Image-Text Matching Mar 23, 2023 Image-text matching Text Matching
Code Code Available 05 Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search Sep 28, 2023 cross-modal alignment Cross-Modal Retrieval
Code Code Available 05 Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking Aug 12, 2019 Binary Classification General Classification
Code Code Available 05 Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering Sep 9, 2023 Image Captioning Image-text matching
Code Code Available 05 Enhancing Image-Text Matching with Adaptive Feature Aggregation Jan 18, 2024 Image-text matching Image-text Retrieval
Code Code Available 05 ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval Jul 29, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 05 Evaluating Attribute Comprehension in Large Vision-Language Models Aug 25, 2024 Attribute Image-text matching
Code Code Available 05 MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Mar 5, 2024 Diversity Image Description
Code Code Available 05