Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP May 24, 2025 Image Captioning Image Generation
— Unverified 0Descriptive Image-Text Matching with Graded Contextual Similarity May 15, 2025 Descriptive Image-text matching
— Unverified 0Compositional Image-Text Matching and Retrieval by Grounding Entities May 4, 2025 Image Captioning Image-text matching
Code Code Available 0Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Apr 16, 2025 Image Augmentation Image Generation
— Unverified 0Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis Apr 15, 2025 Aspect-Based Sentiment Analysis Dependency Parsing
— Unverified 0Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching Mar 19, 2025 Image-text matching Text Matching
Code Code Available 2CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP Mar 5, 2025 Adversarial Robustness Image-text matching
Code Code Available 1MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations Mar 2, 2025 image-classification Image Classification
— Unverified 0IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis Mar 2, 2025 Image Segmentation Image-text matching
Code Code Available 1ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning Feb 27, 2025 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation Feb 27, 2025 Image-text matching Object
Code Code Available 1Object-centric Binding in Contrastive Language-Image Pretraining Feb 19, 2025 Image-text matching Object
— Unverified 0MASS: Overcoming Language Bias in Image-Text Matching Jan 20, 2025 Image-text matching Image-text Retrieval
— Unverified 0FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization Jan 17, 2025 Anomaly Detection Image-text matching
Code Code Available 2Learning Textual Prompts for Open-World Semi-Supervised Learning Jan 1, 2025 Image-text matching Open-World Semi-Supervised Learning
— Unverified 0Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching Dec 26, 2024 Image-text matching Text Matching
— Unverified 0A Concept-Centric Approach to Multi-Modality Learning Dec 18, 2024 Image-text matching Question Answering
— Unverified 0ViUniT: Visual Unit Tests for More Robust Visual Programming Dec 12, 2024 Image Generation Image-text matching
— Unverified 0Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection Nov 28, 2024 Anomaly Detection Image-text matching
— Unverified 0VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis Nov 27, 2024 Human-Object Interaction Detection Image-text matching
— Unverified 0EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning Oct 23, 2024 Contrastive Learning Image-text matching
— Unverified 0Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching Oct 22, 2024 Contrastive Learning Image-text matching
— Unverified 0DARE: Diverse Visual Question Answering with Robustness Evaluation Sep 26, 2024 image-classification Image Classification
— Unverified 0NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training Sep 15, 2024 Contrastive Learning cross-modal alignment
— Unverified 0Evaluating Attribute Comprehension in Large Vision-Language Models Aug 25, 2024 Attribute Image-text matching
Code Code Available 0Towards Deconfounded Image-Text Matching with Causal Inference Aug 22, 2024 Causal Inference Image-text matching
— Unverified 0Dynamic and Compressive Adaptation of Transformers From Images to Videos Aug 13, 2024 Image-text matching Text Matching
— Unverified 0Image-text matching for large-scale book collections Jul 29, 2024 Image-text matching Optical Character Recognition (OCR)
Code Code Available 1UGNCL: Uncertainty-Guided Noisy Correspondence Learning for Efficient Cross-Modal Matching Jul 11, 2024 Cross-Modal Retrieval Cross-modal retrieval with noisy correspondence
Code Code Available 1Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model Jun 18, 2024 Image-text matching Language Modeling
Code Code Available 0Generative Visual Instruction Tuning Jun 17, 2024 Image Generation Image-text matching
Code Code Available 0Composing Object Relations and Attributes for Image-Text Matching Jun 17, 2024 Attribute Graph Attention
Code Code Available 1Advanced Multimodal Deep Learning Architecture for Image-Text Matching Jun 13, 2024 Deep Learning Image-text matching
— Unverified 0Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching Jun 5, 2024 cross-modal alignment Image-text matching
— Unverified 0DEMO: A Statistical Perspective for Efficient Image-Text Matching May 19, 2024 Image-text matching Model Optimization
— Unverified 0CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering May 13, 2024 Audio-visual Question Answering Audio-Visual Question Answering (AVQA)
— Unverified 0RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning May 11, 2024 Image-text matching Retrieval
— Unverified 0Breaking Through the Noisy Correspondence: A Robust Model for Image-Text Matching Apr 29, 2024 Cross-modal retrieval with noisy correspondence Image-text matching
— Unverified 0Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching Apr 28, 2024 Contrastive Learning Image-text matching
Code Code Available 1SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining Apr 1, 2024 Contrastive Learning Image-text matching
— Unverified 0Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models Mar 29, 2024 Image-text matching Object Recognition
— Unverified 0FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues Mar 29, 2024 Image-text matching Language Modeling
— Unverified 0RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training Mar 15, 2024 Diagnostic image-classification
Code Code Available 1MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets Mar 5, 2024 Diversity Image Description
Code Code Available 0Image-Text Matching with Multi-View Attention Feb 27, 2024 Diversity Image-text matching
— Unverified 0ColorSwap: A Color and Word Order Dataset for Multimodal Evaluation Feb 7, 2024 Image Generation Image-text matching
Code Code Available 1MouSi: Poly-Visual-Expert Vision-Language Models Jan 30, 2024 Image Segmentation Image-text matching
Code Code Available 2Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking Jan 29, 2024 Image-text matching Text Matching
Code Code Available 0Enhancing Image-Text Matching with Adaptive Feature Aggregation Jan 18, 2024 Image-text matching Image-text Retrieval
Code Code Available 0