BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Jan 28, 2022 Image Captioning Image-text matching
Code Code Available 55 Language Models Can See: Plugging Visual Controls in Text Generation May 5, 2022 Image Captioning Image-text matching
Code Code Available 25 Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval Mar 22, 2023 Image-text matching Language Modeling
Code Code Available 25 FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization Jan 17, 2025 Anomaly Detection Image-text matching
Code Code Available 25 MouSi: Poly-Visual-Expert Vision-Language Models Jan 30, 2024 Image Segmentation Image-text matching
Code Code Available 25 Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 25 VinVL: Revisiting Visual Representations in Vision-Language Models Jan 2, 2021 Image Captioning Image-text matching
Code Code Available 25 A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models Jul 24, 2023 Image Generation Image-text matching
Code Code Available 25 Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching Mar 19, 2025 Image-text matching Text Matching
Code Code Available 25 Negative Pre-aware for Noisy Cross-modal Matching Dec 10, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 15 MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning Jan 29, 2022 Image-text matching Language Modeling
Code Code Available 15 Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 15 Improved Probabilistic Image-Text Representations May 29, 2023 Data Augmentation Image-text matching
Code Code Available 15 A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval Dec 6, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 15 Negative-Aware Attention Framework for Image-Text Matching Jan 1, 2022 Image-text matching Text Matching
Code Code Available 15 Text-Guided Neural Image Inpainting Apr 7, 2020 Descriptive Image Generation
Code Code Available 15 More Grounded Image Captioning by Distilling Image-Text Matching Model Apr 1, 2020 Image Captioning Image-text matching
Code Code Available 15 MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Oct 11, 2022 Contrastive Learning Image-text matching
Code Code Available 15 Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 15 Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network Jan 1, 2023 Image-text matching Retrieval
Code Code Available 15 Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching Apr 28, 2024 Contrastive Learning Image-text matching
Code Code Available 15 MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts Nov 16, 2023 Binary Classification Descriptive
Code Code Available 15 Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 15 Graph Structured Network for Image-Text Matching Apr 1, 2020 Attribute Cross-Modal Retrieval
Code Code Available 15 A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 15 GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training Aug 8, 2022 Image-text matching Language Modeling
Code Code Available 15 MedICaT: A Dataset of Medical Images, Captions, and Textual References Oct 12, 2020 document understanding Image-text matching
Code Code Available 15 CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 15 CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 15 ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Feb 7, 2024 Image Generation Image-text matching
Code Code Available 15 Learning Dual Semantic Relations with Graph Attention for Image-Text Matching Oct 22, 2020 Cross-Modal Retrieval Graph Attention
Code Code Available 15 ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO Apr 7, 2022 Image-text matching Text Matching
Code Code Available 15 Learning Semantic Relationship Among Instances for Image-Text Matching Jan 1, 2023 Cross-Modal Retrieval Image Retrieval
Code Code Available 15 Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 15 Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners May 18, 2023 Image Generation Image-text matching
Code Code Available 15 Cross-modal Active Complementary Learning with Self-refining Correspondence Oct 26, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 15 IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis Mar 2, 2025 Image Segmentation Image-text matching
Code Code Available 15 AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks Nov 28, 2017 Generative Adversarial Network Image Generation
Code Code Available 15 Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 15 Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 15 LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation May 18, 2023 Attribute Image Generation
Code Code Available 15 Declaration-based Prompt Tuning for Visual Question Answering May 5, 2022 Image-text matching Language Modeling
Code Code Available 15 ComCLIP: Training-Free Compositional Image and Text Matching Nov 25, 2022 Image-text matching Image-text Retrieval
Code Code Available 15 Adaptive Offline Quintuplet Loss for Image-Text Matching Mar 7, 2020 Image-text matching Text Matching
Code Code Available 15 Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 15 BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency Mar 22, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 15 Are Diffusion Models Vision-And-Language Reasoners? May 25, 2023 Denoising Image Generation
Code Code Available 15 DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Dec 2, 2021 Image-text matching Instance Segmentation
Code Code Available 15 BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding Feb 25, 2023 Brain Decoding Image Generation
Code Code Available 15 Learning with Noisy Correspondence for Cross-modal Matching Dec 1, 2021 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 15