GLIPv2: Unifying Localization and Vision-Language Understanding Jun 12, 2022 2D Object Detection Contrastive Learning
Code Code Available 45 OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network Sep 10, 2022 Continual Learning Object
Code Code Available 35 Towards Visual Grounding: A Survey Dec 28, 2024 Phrase Grounding Referring Expression
Code Code Available 35 MDETR - Modulated Detection for End-to-End Multi-Modal Understanding Jan 1, 2021 Phrase Grounding Question Answering
Code Code Available 25 PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Nov 22, 2023 Benchmarking Phrase Grounding
Code Code Available 25 DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 15 Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models May 19, 2015 Image Description Phrase Grounding
Code Code Available 15 Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships Mar 27, 2022 Contrastive Learning Phrase Grounding
Code Code Available 15 What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 15 An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Jan 4, 2024 Described Object Detection Phrase Grounding
Code Code Available 15 A Survey on Interpretable Cross-modal Reasoning Sep 5, 2023 Cross-Modal Retrieval Decision Making
Code Code Available 15 Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 15 Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning Sep 12, 2023 Contrastive Learning Medical Image Analysis
Code Code Available 15 MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Oct 12, 2020 Phrase Grounding
Code Code Available 15 Contrastive Learning for Weakly Supervised Phrase Grounding Jun 17, 2020 Contrastive Learning Language Modeling
Code Code Available 15 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 15 PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models May 23, 2022 Language Modeling Language Modelling
Code Code Available 15 Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 15 Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 05 Conditional Image-Text Embedding Networks Nov 22, 2017 Phrase Grounding
Code Code Available 05 Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing Jan 11, 2023 Phrase Grounding Self-Supervised Learning
Code Code Available 05 Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures May 16, 2025 coreference-resolution Coreference Resolution
Code Code Available 05 Detector-Free Weakly Supervised Grounding by Separation Apr 20, 2021 Phrase Grounding
Code Code Available 05