BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Jan 28, 2022 Image Captioning Image-text matching
Code Code Available 5Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching Mar 19, 2025 Image-text matching Text Matching
Code Code Available 2FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization Jan 17, 2025 Anomaly Detection Image-text matching
Code Code Available 2MouSi: Poly-Visual-Expert Vision-Language Models Jan 30, 2024 Image Segmentation Image-text matching
Code Code Available 2A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models Jul 24, 2023 Image Generation Image-text matching
Code Code Available 2Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval Mar 22, 2023 Image-text matching Language Modeling
Code Code Available 2Language Models Can See: Plugging Visual Controls in Text Generation May 5, 2022 Image Captioning Image-text matching
Code Code Available 2VinVL: Revisiting Visual Representations in Vision-Language Models Jan 2, 2021 Image Captioning Image-text matching
Code Code Available 2Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 2Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 1IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis Mar 2, 2025 Image Segmentation Image-text matching
Code Code Available 1CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 1ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning Feb 27, 2025 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching Jul 11, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 1Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching Apr 28, 2024 Contrastive Learning Image-text matching
Code Code Available 1RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training Mar 15, 2024 Diagnostic image-classification
Code Code Available 1ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Feb 7, 2024 Image Generation Image-text matching
Code Code Available 1Negative Pre-aware for Noisy Cross-modal Matching Dec 10, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Nov 30, 2023 Attribute Compositional Zero-Shot Learning
Code Code Available 1Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models Nov 28, 2023 Image Captioning Image-text matching
Code Code Available 1MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts Nov 16, 2023 Binary Classification Descriptive
Code Code Available 1Cross-modal Active Complementary Learning with Self-refining Correspondence Oct 26, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval Sep 29, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval Aug 24, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 1Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination Aug 8, 2023 Image-text matching Representation Learning
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Jul 3, 2023 Image-text matching Sentence
Code Code Available 1Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark Jun 5, 2023 Attribute Image-text matching
Code Code Available 1Revisiting the Role of Language Priors in Vision-Language Models Jun 2, 2023 Image-text matching Image-text Retrieval
Code Code Available 1Improved Probabilistic Image-Text Representations May 29, 2023 Data Augmentation Image-text matching
Code Code Available 1Are Diffusion Models Vision-And-Language Reasoners? May 25, 2023 Denoising Image Generation
Code Code Available 1Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners May 18, 2023 Image Generation Image-text matching
Code Code Available 1LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation May 18, 2023 Attribute Image Generation
Code Code Available 1Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations May 6, 2023 Image-text matching Text Matching
Code Code Available 1Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation Mar 29, 2023 Image Captioning Image-text matching
Code Code Available 1Plug-and-Play Regulators for Image-Text Matching Mar 23, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 1BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency Mar 22, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding Feb 25, 2023 Brain Decoding Image Generation
Code Code Available 1Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 1Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 1A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 1ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 1Self-supervised vision-language pretraining for Medical visual question answering Nov 24, 2022 Contrastive Learning Image-text matching
Code Code Available 1MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training Aug 8, 2022 Image-text matching Language Modeling
Code Code Available 1Zero-Shot Video Captioning with Evolving Pseudo-Tokens Jul 22, 2022 Image Captioning Image-text matching
Code Code Available 1Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer Jul 5, 2022 Image-text matching Knowledge Distillation
Code Code Available 1