GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 45 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 35 Towards Visual Grounding: A Survey Dec 28, 2024 Phrase Grounding Referring Expression
Code Code Available 35 PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 25 MDETR - Modulated Detection for End-to-End Multi-Modal Understanding Jan 1, 2021 Phrase Grounding Question Answering
Code Code Available 25 Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning Sep 12, 2023 Contrastive Learning Medical Image Analysis
Code Code Available 15 Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 15 Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models May 19, 2015 Image Description Phrase Grounding
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Jan 4, 2024 Described Object Detection Phrase Grounding
Code Code Available 15 A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 15 PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models May 23, 2022 Language Modeling Language Modelling
Code Code Available 15 Contrastive Learning for Weakly Supervised Phrase Grounding Jun 17, 2020 Contrastive Learning Language Modeling
Code Code Available 15 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 15 Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships Mar 27, 2022 Contrastive Learning Phrase Grounding
Code Code Available 15 What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 15 MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Oct 12, 2020 Phrase Grounding
Code Code Available 15 DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 15 Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 15 Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 15 Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models Apr 19, 2024 Contrastive Learning Phrase Grounding
Code Code Available 05 A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models Sep 6, 2023 Phrase Grounding
Code Code Available 05 Anatomical grounding pre-training for medical phrase grounding Feb 23, 2025 Phrase Grounding Zero-Shot Learning
Code Code Available 05 Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models Nov 5, 2023 Data Augmentation Phrase Grounding
Code Code Available 05 Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks Sep 7, 2023 Object Discovery Phrase Grounding
Code Code Available 05 Conditional Image-Text Embedding Networks Nov 22, 2017 Phrase Grounding
Code Code Available 05 Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 05 Detector-Free Weakly Supervised Grounding by Separation Apr 20, 2021 Phrase Grounding
Code Code Available 05 Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures May 16, 2025 coreference-resolution Coreference Resolution
Code Code Available 05 Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents Jul 1, 2024 Emotional Intelligence Emotion Classification
Code Code Available 05 Extending Phrase Grounding with Pronouns in Visual Dialogues Oct 23, 2022 Phrase Grounding
Code Code Available 05 Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Mar 14, 2024 Object Object Counting
Code Code Available 05 Grounding of Textual Phrases in Images by Reconstruction Nov 12, 2015 Language Modeling Language Modelling
Code Code Available 05 Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing Jan 11, 2023 Phrase Grounding Self-Supervised Learning
Code Code Available 05 Learning to ground medical text in a 3D human atlas Nov 1, 2020 Phrase Grounding Visual Grounding
Code Code Available 05 A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training Aug 20, 2024 Autonomous Vehicles Computational Efficiency
Code Code Available 05 Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge Oct 23, 2023 Phrase Grounding World Knowledge
Code Code Available 05 Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing Apr 21, 2022 Contrastive Learning Language Modeling
Code Code Available 05 Modularized Textual Grounding for Counterfactual Resilience Apr 7, 2019 Attribute counterfactual
Code Code Available 05 Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding Nov 28, 2018 Language Modeling Language Modelling
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 Natural Language Object Retrieval Nov 13, 2015 Image Captioning Image Retrieval
Code Code Available 05 Revisiting Image-Language Networks for Open-ended Phrase Detection Nov 17, 2018 object-detection Object Detection
Code Code Available 05 Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability Mar 31, 2023 Conditional Image Generation Image Generation
Code Code Available 05 Phrase Grounding by Soft-Label Chain Conditional Random Field Sep 1, 2019 Phrase Grounding Structured Prediction
Code Code Available 05 Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding May 9, 2018 Diversity Phrase Grounding
Code Code Available 05 Neural Parameter Allocation Search Jun 18, 2020 Image Classification Phrase Grounding
Code Code Available 05 Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding Jan 1, 2023 Phrase Grounding
Code Code Available 05 Transformer with Controlled Attention for Synchronous Motion Captioning Sep 13, 2024 Action Localization Action Segmentation
Code Code Available 05 VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback Jan 29, 2025 Phrase Grounding
Code Code Available 05