Multi-Modal Representation Learning with Text-Driven Soft Masks Apr 3, 2023 Contrastive Learning Data Augmentation
— Unverified 00 MURAL: Multimodal, Multitask Representations Across Languages Nov 1, 2021 Cross-Modal Retrieval Image-text matching
— Unverified 00 MURAL: Multimodal, Multitask Retrieval Across Languages Sep 10, 2021 Cross-Modal Retrieval Image-text matching
— Unverified 00 NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training Sep 15, 2024 Contrastive Learning cross-modal alignment
— Unverified 00 Object-centric Binding in Contrastive Language-Image Pretraining Feb 19, 2025 Image-text matching Object
— Unverified 00 OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization Dec 7, 2023 Adversarial Attack Data Augmentation
— Unverified 00 ParNet: Position-aware Aggregated Relation Network for Image-Text matching Jun 17, 2019 Image-text matching Position
— Unverified 00 Probing the Role of Positional Information in Vision-Language Models Jan 16, 2022 Contrastive Learning Image-text matching
— Unverified 00 Probing the Role of Positional Information in Vision-Language Models May 17, 2023 Contrastive Learning Image-text matching
— Unverified 00 Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training Mar 9, 2023 Image-text matching Language Modeling
— Unverified 00 RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 00 Scene Text Recognition with Image-Text Matching-guided Dictionary May 8, 2023 Image-text matching Language Modeling
— Unverified 00 Selectively Hard Negative Mining for Alleviating Gradient Vanishing in Image-Text Matching Mar 1, 2023 Image-text matching Text Matching
— Unverified 00 Step-Wise Hierarchical Alignment Network for Image-Text Matching Jun 11, 2021 Image-text matching Text Matching
— Unverified 00 SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining Apr 1, 2024 Contrastive Learning Image-text matching
— Unverified 00 TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP May 24, 2025 Image Captioning Image Generation
— Unverified 00 Towards Deconfounded Image-Text Matching with Causal Inference Aug 22, 2024 Causal Inference Image-text matching
— Unverified 00 Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features Jun 1, 2021 Cross-Modal Retrieval Image Retrieval
— Unverified 00 Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models Aug 18, 2023 Image-text matching Object Localization
— Unverified 00 Two-stream Hierarchical Similarity Reasoning for Image-text Matching Mar 10, 2022 Image-text matching Image to text
— Unverified 00 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Apr 1, 2021 Image-text matching Image-text Retrieval
— Unverified 00 UFO: A UniFied TransfOrmer for Vision-Language Representation Learning Nov 19, 2021 Image Captioning Image-text matching
— Unverified 00 Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking Sep 15, 2023 Image-text matching Re-Ranking
— Unverified 00 Uncertainty-based Cross-Modal Retrieval with Probabilistic Representations Apr 20, 2022 Cross-Modal Retrieval Image Retrieval
— Unverified 00 Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training Aug 16, 2019 Image-text matching Image-text Retrieval
— Unverified 00 Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators Sep 22, 2019 Image Captioning Image-text matching
— Unverified 00 Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition Aug 24, 2023 Attribute Image-text matching
— Unverified 00 Uniform Masking Prevails in Vision-Language Pretraining Dec 10, 2022 Image-text matching Language Modeling
— Unverified 00 UNITER: Learning UNiversal Image-TExt Representations Sep 25, 2019 Image-text matching Image-text Retrieval
— Unverified 00 Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching Jan 18, 2022 Image-text matching Referring Expression
— Unverified 00 UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance Oct 28, 2022 Image Generation Image-text matching
— Unverified 00 ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation Aug 31, 2023 Image-text matching Language Modeling
— Unverified 00 ViUniT: Visual Unit Tests for More Robust Visual Programming Dec 12, 2024 Image Generation Image-text matching
— Unverified 00 VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching Jan 1, 2023 Image-text matching Image-text Retrieval
— Unverified 00 VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Nov 27, 2024 Human-Object Interaction Detection Image-text matching
— Unverified 00 VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching May 12, 2021 Image-text matching Referring Expression
— Unverified 00 Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging Oct 6, 2020 Image Classification Image-text matching
— Unverified 00 Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency Jan 1, 2023 Image Segmentation Image-text matching
— Unverified 00