ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding Mar 23, 2023 3D visual grounding Visual Grounding
Code Code Available 0You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding Feb 12, 2019 object-detection Object Detection
Code Code Available 0AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding May 7, 2025 3D visual grounding Graph Attention
Code Code Available 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention Mar 13, 2024 3D visual grounding cross-modal alignment
Code Code Available 0Finding beans in burgers: Deep semantic-visual embedding with localization Apr 5, 2018 Cross-Modal Retrieval Image Captioning
Code Code Available 0Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 0Multi-Attribute Interactions Matter for 3D Visual Grounding Jan 1, 2024 3D visual grounding Attribute
Code Code Available 0Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model May 26, 2025 Diagnostic Reinforcement Learning (RL)
Code Code Available 0Composing Pick-and-Place Tasks By Grounding Language Feb 16, 2021 Natural Language Visual Grounding Robotic Grasping
Code Code Available 0Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model Jul 7, 2024 Segmentation Sentence
Code Code Available 0World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Sep 30, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0Modularized Textual Grounding for Counterfactual Resilience Apr 7, 2019 Attribute counterfactual
Code Code Available 0Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment Dec 5, 2023 Explanation Generation Visual Grounding
Code Code Available 0Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 0MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs Oct 16, 2024 Visual Grounding
Code Code Available 0Self-view Grounding Given a Narrated 360° Video Nov 23, 2017 Sentence Visual Grounding
Code Code Available 0Dual Attention Networks for Visual Reference Resolution in Visual Dialog Feb 25, 2019 AI Agent Question Answering
Code Code Available 0Semantic query-by-example speech search using visual grounding Apr 15, 2019 Retrieval Semantic Retrieval
Code Code Available 0DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 0MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing Mar 31, 2025 Object object-detection
Code Code Available 0M^3D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction Dec 5, 2024 Relation Extraction Visual Grounding
Code Code Available 0Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI May 9, 2025 4k Domain Generalization
Code Code Available 0Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Sep 9, 2024 Language Modeling Language Modelling
Code Code Available 0Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations Jun 4, 2023 Visual Grounding Word Embeddings
Code Code Available 0LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation Mar 18, 2025 Decoder Object
Code Code Available 0Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering Sep 13, 2021 Data Augmentation Question Answering
Code Code Available 0Learning Two-Branch Neural Networks for Image-Text Matching Tasks Apr 11, 2017 Image-text matching Retrieval
Code Code Available 0SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding Jul 27, 2022 Visual Grounding
Code Code Available 0Learning to ground medical text in a 3D human atlas Nov 1, 2020 Phrase Grounding Visual Grounding
Code Code Available 0Smart Vision-Language Reasoners Jul 5, 2024 Math Mathematical Reasoning
Code Code Available 0Learning semantic sentence representations from visually grounded language without lexical knowledge Mar 27, 2019 Grounded language learning Learning Semantic Representations
Code Code Available 0SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency Oct 20, 2020 Question Answering Visual Grounding
Code Code Available 0Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation May 24, 2021 Referring Expression Referring Expression Comprehension
Code Code Available 0DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners Sep 7, 2023 Diagnostic Visual Grounding
Code Code Available 0Adversarial Robustness for Visual Grounding of Multimodal Large Language Models May 16, 2024 Adversarial Attack Adversarial Robustness
Code Code Available 0Language with Vision: a Study on Grounded Word and Sentence Embeddings Jun 17, 2022 Sentence Sentence Embeddings
Code Code Available 0Adaptive Masking Enhances Visual Grounding Oct 4, 2024 Few-Shot Learning Visual Grounding
Code Code Available 0Deconfounded Visual Grounding Dec 31, 2021 Referring Expression Visual Grounding
Code Code Available 0Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 0Language-Guided Diffusion Model for Visual Grounding Aug 18, 2023 cross-modal alignment Denoising
Code Code Available 0Language Adaptive Weight Generation for Multi-task Visual Grounding Jun 6, 2023 Referring Expression Referring Expression Comprehension
Code Code Available 0Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat Sep 10, 2018 Multi-Task Learning Reinforcement Learning
Code Code Available 0Collecting Visually-Grounded Dialogue with A Game Of Sorts Sep 10, 2023 Coreference Resolution Image Retrieval
Code Code Available 0InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions Oct 18, 2023 Benchmarking Visual Grounding
Code Code Available 0CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays May 23, 2025 Diagnostic Question Answering
Code Code Available 0Investigating Compositional Challenges in Vision-Language Models for Visual Grounding Jan 1, 2024 Attribute Relation
Code Code Available 0CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models Dec 22, 2024 Language Modeling Language Modelling
Code Code Available 0HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 0Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Feb 3, 2021 Question Answering Visual Grounding
Code Code Available 0