BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Jan 28, 2022 Image Captioning Image-text matching
Code Code Available 5Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching Mar 19, 2025 Image-text matching Text Matching
Code Code Available 2A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models Jul 24, 2023 Image Generation Image-text matching
Code Code Available 2MouSi: Poly-Visual-Expert Vision-Language Models Jan 30, 2024 Image Segmentation Image-text matching
Code Code Available 2VinVL: Revisiting Visual Representations in Vision-Language Models Jan 2, 2021 Image Captioning Image-text matching
Code Code Available 2Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval Mar 22, 2023 Image-text matching Language Modeling
Code Code Available 2FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization Jan 17, 2025 Anomaly Detection Image-text matching
Code Code Available 2Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 2Language Models Can See: Plugging Visual Controls in Text Generation May 5, 2022 Image Captioning Image-text matching
Code Code Available 2Negative Pre-aware for Noisy Cross-modal Matching Dec 10, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation Mar 29, 2023 Image Captioning Image-text matching
Code Code Available 1More Grounded Image Captioning by Distilling Image-Text Matching Model Apr 1, 2020 Image Captioning Image-text matching
Code Code Available 1Negative-Aware Attention Framework for Image-Text Matching Jan 1, 2022 Image-text matching Text Matching
Code Code Available 1A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 1MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning Jan 29, 2022 Image-text matching Language Modeling
Code Code Available 1Text-Guided Neural Image Inpainting Apr 7, 2020 Descriptive Image Generation
Code Code Available 1LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation May 18, 2023 Attribute Image Generation
Code Code Available 1Learning Dual Semantic Relations with Graph Attention for Image-Text Matching Oct 22, 2020 Cross-Modal Retrieval Graph Attention
Code Code Available 1IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis Mar 2, 2025 Image Segmentation Image-text matching
Code Code Available 1Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners May 18, 2023 Image Generation Image-text matching
Code Code Available 1Learning with Noisy Correspondence for Cross-modal Matching Dec 1, 2021 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts Nov 16, 2023 Binary Classification Descriptive
Code Code Available 1A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 1MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 1Graph Structured Network for Image-Text Matching Apr 1, 2020 Attribute Cross-Modal Retrieval
Code Code Available 1CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 1CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 1ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Feb 7, 2024 Image Generation Image-text matching
Code Code Available 1Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 1GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training Aug 8, 2022 Image-text matching Language Modeling
Code Code Available 1Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 1ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO Apr 7, 2022 Image-text matching Text Matching
Code Code Available 1Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1Cross-modal Active Complementary Learning with Self-refining Correspondence Oct 26, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 1AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks Nov 28, 2017 Generative Adversarial Network Image Generation
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1Improved Probabilistic Image-Text Representations May 29, 2023 Data Augmentation Image-text matching
Code Code Available 1ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 1Declaration-based Prompt Tuning for Visual Question Answering May 5, 2022 Image-text matching Language Modeling
Code Code Available 1Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching Apr 28, 2024 Contrastive Learning Image-text matching
Code Code Available 1Adaptive Offline Quintuplet Loss for Image-Text Matching Mar 7, 2020 Image-text matching Text Matching
Code Code Available 1Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 1BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency Mar 22, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 1Are Diffusion Models Vision-And-Language Reasoners? May 25, 2023 Denoising Image Generation
Code Code Available 1DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Dec 2, 2021 Image-text matching Instance Segmentation
Code Code Available 1BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding Feb 25, 2023 Brain Decoding Image Generation
Code Code Available 1Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1