ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation Aug 31, 2023 Image-text matching Language Modeling
— Unverified 0ViUniT: Visual Unit Tests for More Robust Visual Programming Dec 12, 2024 Image Generation Image-text matching
— Unverified 0VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching Jan 1, 2023 Image-text matching Image-text Retrieval
— Unverified 0VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Nov 27, 2024 Human-Object Interaction Detection Image-text matching
— Unverified 0VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching May 12, 2021 Image-text matching Referring Expression
— Unverified 0Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging Oct 6, 2020 Image Classification Image-text matching
— Unverified 0Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency Jan 1, 2023 Image Segmentation Image-text matching
— Unverified 0Probing the Role of Positional Information in Vision-Language Models Jan 16, 2022 Contrastive Learning Image-text matching
— Unverified 0Probing the Role of Positional Information in Vision-Language Models May 17, 2023 Contrastive Learning Image-text matching
— Unverified 0Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training Mar 9, 2023 Image-text matching Language Modeling
— Unverified 0RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 0Scene Text Recognition with Image-Text Matching-guided Dictionary May 8, 2023 Image-text matching Language Modeling
— Unverified 0Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking Aug 12, 2019 Binary Classification General Classification
Code Code Available 0Position Focused Attention Network for Image-Text Matching Jul 23, 2019 Image-text matching Position
Code Code Available 0Dual Attention Networks for Multimodal Reasoning and Matching Nov 2, 2016 Collaborative Inference Image-text matching
Code Code Available 0Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? Oct 21, 2022 Image-text matching Language Modeling
Code Code Available 0MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval May 18, 2023 Image-text matching Retrieval
Code Code Available 0MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Mar 5, 2024 Diversity Image Description
Code Code Available 0Learning Two-Branch Neural Networks for Image-Text Matching Tasks Apr 11, 2017 Image-text matching Retrieval
Code Code Available 0Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking Jan 29, 2024 Image-text matching Text Matching
Code Code Available 0Enhancing Image-Text Matching with Adaptive Feature Aggregation Jan 18, 2024 Image-text matching Image-text Retrieval
Code Code Available 0RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models Apr 21, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 0Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search Sep 28, 2023 cross-modal alignment Cross-Modal Retrieval
Code Code Available 0ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval Jul 29, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 0Learning fragment self-attention embeddings for image-text matching Oct 1, 2019 Image-text matching Sentence
Code Code Available 0Integrating Language Guidance Into Image-Text Matching for Correcting False Negatives Mar 24, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 0Increasing Textual Context Size Boosts Medical Image-Text Matching Mar 23, 2023 Image-text matching Text Matching
Code Code Available 0Dissecting Deep Metric Learning Losses for Image-Text Retrieval Oct 21, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 0Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks Sep 14, 2023 Image-text matching Sarcasm Detection
Code Code Available 0Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information May 2, 2023 Bayesian Inference Image-text matching
Code Code Available 0GR-GAN: Gradual Refinement Text-to-image Generation May 23, 2022 Generative Adversarial Network Image Generation
Code Code Available 0Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jun 18, 2024 Image-text matching Language Modeling
Code Code Available 0Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering Sep 9, 2023 Image Captioning Image-text matching
Code Code Available 0Deep Cross-Modal Projection Learning for Image-Text Matching Sep 1, 2018 Cross-Modal Retrieval Image-text matching
Code Code Available 0Compositional Image-Text Matching and Retrieval by Grounding Entities May 4, 2025 Image Captioning Image-text matching
Code Code Available 0Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP Jan 1, 2024 Backdoor Attack Contrastive Learning
Code Code Available 0Generative Visual Instruction Tuning Jun 17, 2024 Image Generation Image-text matching
Code Code Available 0Evaluating Attribute Comprehension in Large Vision-Language Models Aug 25, 2024 Attribute Image-text matching
Code Code Available 0