CCMB: A Large-scale Chinese Cross-modal Benchmark May 8, 2022 image-classification Image Classification
Code Code Available 1Declaration-based Prompt Tuning for Visual Question Answering May 5, 2022 Image-text matching Language Modeling
Code Code Available 1No Token Left Behind: Explainability-Aided Image Classification and Generation Apr 11, 2022 image-classification Image Classification
Code Code Available 1ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO Apr 7, 2022 Image-text matching Text Matching
Code Code Available 1MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning Jan 29, 2022 Image-text matching Language Modeling
Code Code Available 1Negative-Aware Attention Framework for Image-Text Matching Jan 1, 2022 Image-text matching Text Matching
Code Code Available 1DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Dec 2, 2021 Image-text matching Instance Segmentation
Code Code Available 1Learning with Noisy Correspondence for Cross-modal Matching Dec 1, 2021 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Align before Fuse: Vision and Language Representation Learning with Momentum Distillation Jul 16, 2021 Cross-Modal Retrieval Grounded language learning
Code Code Available 1A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval Jun 4, 2021 Graph Matching Image Retrieval
Code Code Available 1Similarity Reasoning and Filtration for Image-Text Matching Jan 5, 2021 Cross-Modal Retrieval Image Retrieval
Code Code Available 1Learning Dual Semantic Relations with Graph Attention for Image-Text Matching Oct 22, 2020 Cross-Modal Retrieval Graph Attention
Code Code Available 1MedICaT: A Dataset of Medical Images, Captions, and Textual References Oct 12, 2020 document understanding Image-text matching
Code Code Available 1Universal Weighting Metric Learning for Cross-Modal Matching Oct 7, 2020 Image-text matching Metric Learning
Code Code Available 1Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 1Deep Multimodal Neural Architecture Search Apr 25, 2020 Decoder Image-text matching
Code Code Available 1Transformer Reasoning Network for Image-Text Matching and Retrieval Apr 20, 2020 Image Retrieval Image-text matching
Code Code Available 1Text-Guided Neural Image Inpainting Apr 7, 2020 Descriptive Image Generation
Code Code Available 1Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers Apr 2, 2020 Image-text matching Image-text Retrieval
Code Code Available 1More Grounded Image Captioning by Distilling Image-Text Matching Model Apr 1, 2020 Image Captioning Image-text matching
Code Code Available 1Graph Structured Network for Image-Text Matching Apr 1, 2020 Attribute Cross-Modal Retrieval
Code Code Available 1Adaptive Offline Quintuplet Loss for Image-Text Matching Mar 7, 2020 Image-text matching Text Matching
Code Code Available 1UNITER: UNiversal Image-TExt Representation Learning Sep 25, 2019 Image-text matching Image-text Retrieval
Code Code Available 1Visual Semantic Reasoning for Image-Text Matching Sep 6, 2019 Cross-Modal Retrieval Image Retrieval
Code Code Available 1VL-BERT: Pre-training of Generic Visual-Linguistic Representations Aug 22, 2019 Image-text matching Language Modelling
Code Code Available 1Stacked Cross Attention for Image-Text Matching Mar 21, 2018 Cross-Modal Retrieval Image Retrieval
Code Code Available 1AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks Nov 28, 2017 Generative Adversarial Network Image Generation
Code Code Available 1TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP May 24, 2025 Image Captioning Image Generation
— Unverified 0Descriptive Image-Text Matching with Graded Contextual Similarity May 15, 2025 Descriptive Image-text matching
— Unverified 0Compositional Image-Text Matching and Retrieval by Grounding Entities May 4, 2025 Image Captioning Image-text matching
Code Code Available 0Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Apr 16, 2025 Image Augmentation Image Generation
— Unverified 0Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis Apr 15, 2025 Aspect-Based Sentiment Analysis Dependency Parsing
— Unverified 0MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations Mar 2, 2025 image-classification Image Classification
— Unverified 0Object-centric Binding in Contrastive Language-Image Pretraining Feb 19, 2025 Image-text matching Object
— Unverified 0MASS: Overcoming Language Bias in Image-Text Matching Jan 20, 2025 Image-text matching Image-text Retrieval
— Unverified 0Learning Textual Prompts for Open-World Semi-Supervised Learning Jan 1, 2025 Image-text matching Open-World Semi-Supervised Learning
— Unverified 0Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching Dec 26, 2024 Image-text matching Text Matching
— Unverified 0A Concept-Centric Approach to Multi-Modality Learning Dec 18, 2024 Image-text matching Question Answering
— Unverified 0ViUniT: Visual Unit Tests for More Robust Visual Programming Dec 12, 2024 Image Generation Image-text matching
— Unverified 0Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection Nov 28, 2024 Anomaly Detection Image-text matching
— Unverified 0VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Nov 27, 2024 Human-Object Interaction Detection Image-text matching
— Unverified 0EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning Oct 23, 2024 Contrastive Learning Image-text matching
— Unverified 0Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching Oct 22, 2024 Contrastive Learning Image-text matching
— Unverified 0DARE: Diverse Visual Question Answering with Robustness Evaluation Sep 26, 2024 image-classification Image Classification
— Unverified 0NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training Sep 15, 2024 Contrastive Learning cross-modal alignment
— Unverified 0Evaluating Attribute Comprehension in Large Vision-Language Models Aug 25, 2024 Attribute Image-text matching
Code Code Available 0Towards Deconfounded Image-Text Matching with Causal Inference Aug 22, 2024 Causal Inference Image-text matching
— Unverified 0Dynamic and Compressive Adaptation of Transformers From Images to Videos Aug 13, 2024 Image-text matching Text Matching
— Unverified 0Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jun 18, 2024 Image-text matching Language Modeling
Code Code Available 0Generative Visual Instruction Tuning Jun 17, 2024 Image Generation Image-text matching
Code Code Available 0