GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 4OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 3Towards Visual Grounding: A Survey Dec 28, 2024 Phrase Grounding Referring Expression
Code Code Available 3MDETR - Modulated Detection for End-to-End Multi-Modal Understanding Jan 1, 2021 Phrase Grounding Question Answering
Code Code Available 2PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 2Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 1What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 1A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 1Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models May 19, 2015 Image Description Phrase Grounding
Code Code Available 1Contrastive Learning for Weakly Supervised Phrase Grounding Jun 17, 2020 Contrastive Learning Language Modeling
Code Code Available 1Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships Mar 27, 2022 Contrastive Learning Phrase Grounding
Code Code Available 1Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Oct 12, 2020 Phrase Grounding
Code Code Available 1An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Jan 4, 2024 Described Object Detection Phrase Grounding
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning Sep 12, 2023 Contrastive Learning Medical Image Analysis
Code Code Available 1PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models May 23, 2022 Language Modeling Language Modelling
Code Code Available 1CAVL: Learning Contrastive and Adaptive Representations of Vision and Language Apr 10, 2023 Image Retrieval Phrase Grounding
— Unverified 0A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data Mar 2, 2025 object-detection Object Detection
— Unverified 0Grounding Plural Phrases: Countering Evaluation Biases by Individuation Jun 1, 2021 Phrase Grounding
— Unverified 0Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension Jan 2, 2025 Generalized Referring Expression Comprehension Generalized Referring Expression Segmentation
— Unverified 0How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding Feb 29, 2024 Causal Inference counterfactual
— Unverified 0Improving Pre-trained Vision-and-Language Embeddings for Phrase Grounding Nov 1, 2021 Multimodal Reasoning Phrase Grounding
— Unverified 0Knowledge Aided Consistency for Weakly Supervised Phrase Grounding Mar 11, 2018 Phrase Grounding
— Unverified 0Language Features Matter: Effective Language Representations for Vision-Language Tasks Aug 17, 2019 Image Captioning Language Modelling
— Unverified 0Learning Deep Structure-Preserving Image-Text Embeddings Nov 19, 2015 Image Retrieval Image to text
— Unverified 0Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Jan 28, 2025 object-detection Object Detection
— Unverified 0LIMITR: Leveraging Local Information for Medical Image-Text Representation Mar 21, 2023 Image Retrieval Phrase Grounding
— Unverified 0Lite-MDETR: A Lightweight Multi-Modal Detector Jan 1, 2022 object-detection Object Detection
— Unverified 0Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training Mar 4, 2024 Math Phrase Grounding
— Unverified 0CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting Jul 11, 2024 Data Augmentation Phrase Grounding
— Unverified 0Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment Mar 14, 2023 Medical Image Analysis Phrase Grounding
— Unverified 0MedRG: Medical Report Grounding with Multi-modal Large Language Model Apr 10, 2024 Decoder Language Modeling
— Unverified 0Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment Mar 27, 2019 Image Retrieval Phrase Grounding
— Unverified 0Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding Oct 7, 2022 Anatomy Phrase Grounding
— Unverified 0Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models Jun 12, 2025 Anatomy Image Generation
— Unverified 0Disentangled Motif-aware Graph Learning for Phrase Grounding Apr 13, 2021 Diversity Graph Learning
— Unverified 0Neural Sequential Phrase Grounding (SeqGROUND) Mar 18, 2019 Phrase Grounding
— Unverified 0Dynamic Conditional Networks for Few-Shot Learning Sep 1, 2018 Face Generation Few-Shot Learning
— Unverified 0Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection Feb 2, 2024 object-detection Object Detection
— Unverified 0ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity Apr 11, 2023 Phrase Grounding
— Unverified 0PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding Dec 7, 2018 Phrase Grounding Sentence
— Unverified 0Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data Aug 30, 2024 Hallucination Phrase Grounding
— Unverified 0Progressive Local Alignment for Medical Multimodal Pre-training Feb 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Propagating Over Phrase Relations for One-Stage Visual Grounding Aug 1, 2020 Phrase Grounding Relational Reasoning
— Unverified 0Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM Apr 29, 2024 Phrase Grounding Scene Understanding
— Unverified 0Query-guided Regression Network with Context Policy for Phrase Grounding Aug 4, 2017 Phrase Grounding regression
— Unverified 0