GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4Towards Visual Grounding: A Survey Dec 28, 2024 Phrase Grounding Referring Expression
Code Code Available 3OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 3PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 2MDETR - Modulated Detection for End-to-End Multi-Modal Understanding Jan 1, 2021 Phrase Grounding Question Answering
Code Code Available 2An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Jan 4, 2024 Described Object Detection Phrase Grounding
Code Code Available 1Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning Sep 12, 2023 Contrastive Learning Medical Image Analysis
Code Code Available 1A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 1What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models May 23, 2022 Language Modeling Language Modelling
Code Code Available 1Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships Mar 27, 2022 Contrastive Learning Phrase Grounding
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Oct 12, 2020 Phrase Grounding
Code Code Available 1Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 1Contrastive Learning for Weakly Supervised Phrase Grounding Jun 17, 2020 Contrastive Learning Language Modeling
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 1Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models May 19, 2015 Image Description Phrase Grounding
Code Code Available 1Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models Jun 12, 2025 Anatomy Image Generation
— Unverified 0Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures May 16, 2025 coreference-resolution Coreference Resolution
Code Code Available 0A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data Mar 2, 2025 object-detection Object Detection
— Unverified 0Progressive Local Alignment for Medical Multimodal Pre-training Feb 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Anatomical grounding pre-training for medical phrase grounding Feb 23, 2025 Phrase Grounding Zero-Shot Learning
Code Code Available 0VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback Jan 29, 2025 Phrase Grounding
Code Code Available 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension Jan 2, 2025 Generalized Referring Expression Comprehension Generalized Referring Expression Segmentation
— Unverified 0ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation Dec 12, 2024 Phrase Grounding Question Answering
— Unverified 0Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 0Transformer with Controlled Attention for Synchronous Motion Captioning Sep 13, 2024 Action Localization Action Segmentation
Code Code Available 0Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data Aug 30, 2024 Hallucination Phrase Grounding
— Unverified 0A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training Aug 20, 2024 Autonomous Vehicles Computational Efficiency
Code Code Available 0CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting Jul 11, 2024 Data Augmentation Phrase Grounding
— Unverified 0Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents Jul 1, 2024 Emotional Intelligence Emotion Classification
Code Code Available 0Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM Apr 29, 2024 Phrase Grounding Scene Understanding
— Unverified 0Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models Apr 19, 2024 Contrastive Learning Phrase Grounding
Code Code Available 0MedRG: Medical Report Grounding with Multi-modal Large Language Model Apr 10, 2024 Decoder Language Modeling
— Unverified 0Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Mar 14, 2024 Object Object Counting
Code Code Available 0Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training Mar 4, 2024 Math Phrase Grounding
— Unverified 0How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding Feb 29, 2024 Causal Inference counterfactual
— Unverified 0Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection Feb 2, 2024 object-detection Object Detection
— Unverified 0Enhancing the vision-language foundation model with key semantic knowledge-emphasized report refinement Jan 21, 2024 Medical Image Analysis Phrase Grounding
— Unverified 0Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models Nov 5, 2023 Data Augmentation Phrase Grounding
Code Code Available 0Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge Oct 23, 2023 Phrase Grounding World Knowledge
Code Code Available 0Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks Sep 7, 2023 Object Discovery Phrase Grounding
Code Code Available 0A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models Sep 6, 2023 Phrase Grounding
Code Code Available 0Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications Aug 30, 2023 Decoder object-detection
— Unverified 0Read, look and detect: Bounding box annotation from image-caption pairs Jun 9, 2023 Object object-detection
— Unverified 0ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity Apr 11, 2023 Phrase Grounding
— Unverified 0