Advanced Multimodal Deep Learning Architecture for Image-Text Matching Jun 13, 2024 Deep Learning Image-text matching
— Unverified 0Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching Jun 5, 2024 cross-modal alignment Image-text matching
— Unverified 0DEMO: A Statistical Perspective for Efficient Image-Text Matching May 19, 2024 Image-text matching Model Optimization
— Unverified 0CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering May 13, 2024 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
— Unverified 0RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 0Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching Apr 29, 2024 Cross-modal retrieval with noisy correspondence Image-text matching
— Unverified 0SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining Apr 1, 2024 Contrastive Learning Image-text matching
— Unverified 0Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models Mar 29, 2024 Image-text matching Object Recognition
— Unverified 0FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues Mar 29, 2024 Image-text matching Language Modeling
— Unverified 0MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Mar 5, 2024 Diversity Image Description
Code Code Available 0Image-Text Matching with Multi-View Attention Feb 27, 2024 Diversity Image-text matching
— Unverified 0Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking Jan 29, 2024 Image-text matching Text Matching
Code Code Available 0Enhancing Image-Text Matching with Adaptive Feature Aggregation Jan 18, 2024 Image-text matching Image-text Retrieval
Code Code Available 0Backdoor Attack on Unpaired Medical Image-Text Foundation Models: A Pilot Study on MedCLIP Jan 1, 2024 Backdoor Attack Contrastive Learning
Code Code Available 0OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization Dec 7, 2023 Adversarial Attack Data Augmentation
— Unverified 0CILF-CIAE: CLIP-driven Image-Language Fusion for Correcting Inverse Age Estimation Dec 4, 2023 Age Estimation Image-text matching
— Unverified 0Active Mining Sample Pair Semantics for Image-text Matching Nov 9, 2023 Active Learning Image-text matching
— Unverified 0A New Fine-grained Alignment Method for Image-text Matching Nov 3, 2023 Image-text matching Image-text Retrieval
— Unverified 0Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification Oct 17, 2023 Image Retrieval Image-text matching
— Unverified 0Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search Sep 28, 2023 cross-modal alignment Cross-Modal Retrieval
Code Code Available 0Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking Sep 15, 2023 Image-text matching Re-Ranking
— Unverified 0Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks Sep 14, 2023 Image-text matching Sarcasm Detection
Code Code Available 0Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering Sep 9, 2023 Image Captioning Image-text matching
Code Code Available 0ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation Aug 31, 2023 Image-text matching Language Modeling
— Unverified 0Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition Aug 24, 2023 Attribute Image-text matching
— Unverified 0EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE Aug 23, 2023 Image-text matching Image-text Retrieval
— Unverified 0Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models Aug 18, 2023 Image-text matching Object Localization
— Unverified 0Grounded Image Text Matching with Mismatched Relation Reasoning Aug 2, 2023 Image-text matching Relation
— Unverified 0MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval May 18, 2023 Image-text matching Retrieval
Code Code Available 0Probing the Role of Positional Information in Vision-Language Models May 17, 2023 Contrastive Learning Image-text matching
— Unverified 0Scene Text Recognition with Image-Text Matching-guided Dictionary May 8, 2023 Image-text matching Language Modeling
— Unverified 0Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information May 2, 2023 Bayesian Inference Image-text matching
Code Code Available 0RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models Apr 21, 2023 Cross-Modal Retrieval Image-text matching
Code Code Available 0Multi-Modal Representation Learning with Text-Driven Soft Masks Apr 3, 2023 Contrastive Learning Data Augmentation
— Unverified 0Integrating Language Guidance Into Image-Text Matching for Correcting False Negatives Mar 24, 2023 Cross-modal retrieval with noisy correspondence Image-text matching
Code Code Available 0Increasing Textual Context Size Boosts Medical Image-Text Matching Mar 23, 2023 Image-text matching Text Matching
Code Code Available 0Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training Mar 9, 2023 Image-text matching Language Modeling
— Unverified 0Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching Mar 1, 2023 Image-text matching Text Matching
— Unverified 0VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching Jan 1, 2023 Image-text matching Image-text Retrieval
— Unverified 0Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency Jan 1, 2023 Image Segmentation Image-text matching
— Unverified 0Multimodal Matching-aware Co-attention Networks with Mutual Knowledge Distillation for Fake News Detection Dec 12, 2022 Fake News Detection Image-text matching
— Unverified 0Uniform Masking Prevails in Vision-Language Pretraining Dec 10, 2022 Image-text matching Language Modeling
— Unverified 0UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance Oct 28, 2022 Image Generation Image-text matching
— Unverified 0Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? Oct 21, 2022 Image-text matching Language Modeling
Code Code Available 0Dissecting Deep Metric Learning Losses for Image-Text Retrieval Oct 21, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 0AdsCVLR: Commercial Visual-Linguistic Representation Modeling in Sponsored Search Oct 10, 2022 Contrastive Learning Image-text matching
— Unverified 0ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval Jul 29, 2022 Cross-Modal Retrieval Image-text matching
Code Code Available 0Don't Stop Learning: Towards Continual Learning for the CLIP Model Jul 19, 2022 Continual Learning Image-text matching
— Unverified 0GR-GAN: Gradual Refinement Text-to-image Generation May 23, 2022 Generative Adversarial Network Image Generation
Code Code Available 0Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations Apr 20, 2022 Cross-Modal Retrieval Image Retrieval
— Unverified 0