Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models Jun 12, 2025 Anatomy Image Generation
— Unverified 0Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures May 16, 2025 coreference-resolution Coreference Resolution
Code Code Available 0A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data Mar 2, 2025 object-detection Object Detection
— Unverified 0Progressive Local Alignment for Medical Multimodal Pre-training Feb 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Anatomical grounding pre-training for medical phrase grounding Feb 23, 2025 Phrase Grounding Zero-Shot Learning
Code Code Available 0VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback Jan 29, 2025 Phrase Grounding
Code Code Available 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension Jan 2, 2025 Generalized Referring Expression Comprehension Generalized Referring Expression Segmentation
— Unverified 0Towards Visual Grounding: A Survey Dec 28, 2024 Phrase Grounding Referring Expression
Code Code Available 3ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Dec 12, 2024 Phrase Grounding Question Answering
— Unverified 0Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 0Transformer with Controlled Attention for Synchronous Motion Captioning Sep 13, 2024 Action Localization Action Segmentation
Code Code Available 0Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data Aug 30, 2024 Hallucination Phrase Grounding
— Unverified 0A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training Aug 20, 2024 Autonomous Vehicles Computational Efficiency
Code Code Available 0CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting Jul 11, 2024 Data Augmentation Phrase Grounding
— Unverified 0Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents Jul 1, 2024 Emotional Intelligence Emotion Classification
Code Code Available 0Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM Apr 29, 2024 Phrase Grounding Scene Understanding
— Unverified 0Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models Apr 19, 2024 Contrastive Learning Phrase Grounding
Code Code Available 0MedRG: Medical Report Grounding with Multi-modal Large Language Model Apr 10, 2024 Decoder Language Modeling
— Unverified 0Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Mar 14, 2024 Object Object Counting
Code Code Available 0Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training Mar 4, 2024 Math Phrase Grounding
— Unverified 0How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding Feb 29, 2024 Causal Inference counterfactual
— Unverified 0Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection Feb 2, 2024 object-detection Object Detection
— Unverified 0Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement Jan 21, 2024 Medical Image Analysis Phrase Grounding
— Unverified 0An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Jan 4, 2024 Described Object Detection Phrase Grounding
Code Code Available 1PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 2Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models Nov 5, 2023 Data Augmentation Phrase Grounding
Code Code Available 0Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge Oct 23, 2023 Phrase Grounding World Knowledge
Code Code Available 0Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning Sep 12, 2023 Contrastive Learning Medical Image Analysis
Code Code Available 1Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks Sep 7, 2023 Object Discovery Phrase Grounding
Code Code Available 0A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models Sep 6, 2023 Phrase Grounding
Code Code Available 0A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 1Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications Aug 30, 2023 Decoder object-detection
— Unverified 0Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Read, look and detect: Bounding box annotation from image-caption pairs Jun 9, 2023 Object object-detection
— Unverified 0ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity Apr 11, 2023 Phrase Grounding
— Unverified 0CAVL: Learning Contrastive and Adaptive Representations of Vision and Language Apr 10, 2023 Image Retrieval Phrase Grounding
— Unverified 0Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability Mar 31, 2023 Conditional Image Generation Image Generation
Code Code Available 0LIMITR: Leveraging Local Information for Medical Image-Text Representation Mar 21, 2023 Image Retrieval Phrase Grounding
— Unverified 0Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection Mar 17, 2023 Attribute Contrastive Learning
— Unverified 0Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment Mar 14, 2023 Medical Image Analysis Phrase Grounding
— Unverified 0Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing Jan 11, 2023 Phrase Grounding Self-Supervised Learning
Code Code Available 0Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding Jan 1, 2023 Phrase Grounding
Code Code Available 0DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 1Extending Phrase Grounding with Pronouns in Visual Dialogues Oct 23, 2022 Phrase Grounding
Code Code Available 0Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding Oct 7, 2022 Anatomy Phrase Grounding
— Unverified 0OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 3What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4