Visual Grounding via Accumulated Attention Jun 1, 2018 Sentence Visual Grounding
— Unverified 0Visual Grounding with Attention-Driven Constraint Balancing Jul 3, 2024 Object object-detection
— Unverified 0Visual Intention Grounding for Egocentric Assistants Apr 18, 2025 Object Visual Grounding
— Unverified 0Visually grounded cross-lingual keyword spotting in speech Jun 13, 2018 Keyword Spotting Visual Grounding
— Unverified 0Visually Grounded Neural Syntax Acquisition Jun 7, 2019 Visual Grounding
— Unverified 0Visual Prompting in Multimodal Large Language Models: A Survey Sep 5, 2024 In-Context Learning Prompt Learning
— Unverified 0Visual Reference Resolution using Attention Memory for Visual Dialog Sep 23, 2017 Parameter Prediction Question Answering
— Unverified 0VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation Jul 9, 2025 Backdoor Attack Visual Grounding
— Unverified 0VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Oct 7, 2024 Information Retrieval Language Modeling
— Unverified 0VLMAE: Vision-Language Masked Autoencoder Aug 19, 2022 Image-text Retrieval Language Modeling
— Unverified 0VQD: Visual Query Detection in Natural Scenes Apr 4, 2019 Referring Expression Referring Expression Comprehension
— Unverified 0WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model Aug 30, 2023 Language Modeling Language Modelling
— Unverified 0WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar Mar 19, 2024 Autonomous Navigation Referring Expression
— Unverified 0Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment Dec 15, 2023 3D visual grounding Natural Language Queries
— Unverified 0Weakly-supervised segmentation of referring expressions May 10, 2022 Image Segmentation Referring Expression
— Unverified 0Weakly-supervised Visual Grounding of Phrases with Linguistic Structures May 3, 2017 Sentence Visual Grounding
— Unverified 0When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding Jul 31, 2021 Decoder Sentence
— Unverified 0YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding Oct 10, 2022 Visual Grounding
— Unverified 0Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding Mar 8, 2025 Language Modeling Language Modelling
— Unverified 0ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 0Zero-Shot 3D Visual Grounding from Vision-Language Models May 28, 2025 3D visual grounding Visual Grounding
— Unverified 0Zero-Shot Visual Grounding of Referring Utterances in Dialogue Nov 16, 2021 Descriptive Visual Grounding
— Unverified 0A Better Loss for Visual-Textual Grounding Aug 11, 2021 Sentence Visual Grounding
Code Code Available 0Context-Infused Visual Grounding for Art Oct 16, 2024 object-detection Object Detection
Code Code Available 0Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding Jan 1, 2025 3D visual grounding Data Augmentation
Code Code Available 0Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding Apr 13, 2025 3D visual grounding Data Augmentation
Code Code Available 0Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network Oct 25, 2023 Visual Grounding
Code Code Available 0Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models Dec 11, 2024 Question Answering Visual Grounding
Code Code Available 0Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding Oct 31, 2024 Object Position
Code Code Available 0ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding Aug 29, 2024 Data Augmentation Image Generation
Code Code Available 0Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization Apr 17, 2024 3D dense captioning 3D visual grounding
Code Code Available 0Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding May 9, 2018 Diversity Phrase Grounding
Code Code Available 0WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language Apr 12, 2023 3D visual grounding Autonomous Driving
Code Code Available 0G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training Dec 3, 2023 object-detection Object Detection
Code Code Available 0AttnGrounder: Talking to Cars with Attention Sep 11, 2020 Referring Expression Comprehension Visual Grounding
Code Code Available 0Revisiting Visual Question Answering Baselines Jun 27, 2016 Binary Classification Multiple-choice
Code Code Available 0Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition Jul 5, 2024 Visual Grounding Visual Storytelling
Code Code Available 0NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning Oct 17, 2023 Segmentation Visual Grounding
Code Code Available 0Neural Twins Talk Sep 26, 2020 Image Captioning Sentence
Code Code Available 0Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 0Connecting Vision and Language with Localized Narratives Dec 6, 2019 Form Image Captioning
Code Code Available 0Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer May 1, 2022 Text Generation Visual Grounding
Code Code Available 0OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Feb 7, 2022 Image Captioning image-classification
Code Code Available 0RoViST:Learning Robust Metrics for Visual Storytelling May 8, 2022 Sentence Text Generation
Code Code Available 0RoViST: Learning Robust Metrics for Visual Storytelling Jul 1, 2022 Sentence Text Generation
Code Code Available 0Flexible Visual Grounding May 1, 2022 Articles Visual Grounding
Code Code Available 0UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings May 17, 2025 Image to text Information Retrieval
Code Code Available 0FiVL: A Framework for Improved Vision-Language Alignment Dec 19, 2024 Answer Generation Multimodal Reasoning
Code Code Available 0Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 0