From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes Jun 5, 2025 3D visual grounding Object
— Unverified 00 From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models Jun 28, 2024 Diversity Retrieval
— Unverified 00 Parallel Vertex Diffusion for Unified Visual Grounding Mar 13, 2023 Visual Grounding
— Unverified 00 Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Oct 31, 2024 parameter-efficient fine-tuning Visual Grounding
— Unverified 00 PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding Jul 19, 2024 3D visual grounding Attribute
— Unverified 00 Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Jun 5, 2025 Math Visual Grounding
— Unverified 00 Visual Prompting in Multimodal Large Language Models: A Survey Sep 5, 2024 In-Context Learning Prompt Learning
— Unverified 00 Context-Aware Indoor Point Cloud Object Generation through User Instructions Nov 26, 2023 Position Visual Grounding
— Unverified 00 Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models Aug 15, 2024 Pose Estimation Visual Grounding
— Unverified 00 Focusing On Targets For Improving Weakly Supervised Visual Grounding Feb 22, 2023 Dependency Parsing Object
— Unverified 00 Programming with Pixels: Computer-Use Meets Software Engineering Feb 24, 2025 Visual Grounding
— Unverified 00 FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts Jun 27, 2024 Decision Making Logical Reasoning
— Unverified 00 Visual Reference Resolution using Attention Memory for Visual Dialog Sep 23, 2017 Parameter Prediction Question Answering
— Unverified 00 Propagating Over Phrase Relations for One-Stage Visual Grounding Aug 1, 2020 Phrase Grounding Relational Reasoning
— Unverified 00 ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding Feb 26, 2025 3D visual grounding Visual Grounding
— Unverified 00 FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis Jan 17, 2025 Bayesian Inference Language Modeling
— Unverified 00 Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding Nov 5, 2024 3D visual grounding Visual Grounding
— Unverified 00 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning Mar 30, 2025 3D visual grounding Feature Splatting
— Unverified 00 FindIt: Generalized Localization with Natural Language Queries Mar 31, 2022 Natural Language Queries Object
— Unverified 00 Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics May 22, 2025 Image Captioning text similarity
— Unverified 00 Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder Jul 13, 2020 Question Answering Visual Grounding
— Unverified 00 Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos Jun 1, 2018 Multiple Instance Learning Sentence
— Unverified 00 ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations Jan 24, 2025 Decoder Object
— Unverified 00 Referencing Where to Focus: Improving VisualGrounding with Referential Query Dec 26, 2024 Decoder Visual Grounding
— Unverified 00 Few-Shot Visual Grounding for Natural Human-Robot Interaction Mar 17, 2021 Visual Grounding
— Unverified 00 Joint Visual Grounding with Language Scene Graphs Jun 9, 2019 Referring Expression Visual Grounding
— Unverified 00 Fast visual grounding in interaction: bringing few-shot learning with neural networks to an interactive robot Jun 1, 2020 Few-Shot Learning Transfer Learning
— Unverified 00 Referring to Screen Texts with Voice Assistants Jun 10, 2023 Navigate Visual Grounding
— Unverified 00 FACET: Fairness in Computer Vision Evaluation Benchmark Aug 31, 2023 Fairness image-classification
— Unverified 00 Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog Dec 20, 2019 Audio Classification Visual Grounding
— Unverified 00 Explainable Video Entailment With Grounded Visual Evidence Jan 1, 2021 Visual Grounding
— Unverified 00 Learning to Assemble Neural Module Tree Networks for Visual Grounding Dec 8, 2018 Dependency Parsing Natural Language Visual Grounding
— Unverified 00 AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training Jun 19, 2021 Visual Grounding
— Unverified 00 VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation Jul 9, 2025 Backdoor Attack Visual Grounding
— Unverified 00 ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 00 Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models Sep 8, 2021 Concept-To-Text Generation Specificity
— Unverified 00 Revisiting Data Auditing in Large Vision-Language Models Apr 25, 2025 Visual Grounding
— Unverified 00 Revisiting Visual Grounding Apr 3, 2019 Image Retrieval Retrieval
— Unverified 00 AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations Apr 10, 2025 Spatial Reasoning Visual Grounding
— Unverified 00 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Dec 6, 2024 document understanding Hallucination
— Unverified 00 Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment Dec 4, 2023 Grounded language learning Language Modeling
— Unverified 00 Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Mar 5, 2025 Anomaly Detection Visual Grounding
— Unverified 00 Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation Mar 14, 2024 Decision Making Language Modeling
— Unverified 00 Extending CLIP's Image-Text Alignment to Referring Image Segmentation Jun 14, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 00 RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception Jan 31, 2025 Reinforcement Learning (RL) Spatial Reasoning
— Unverified 00 RoViST: Learning Robust Metrics for Visual Storytelling Dec 17, 2021 Sentence Text Generation
— Unverified 00 VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks Oct 7, 2024 Information Retrieval Language Modeling
— Unverified 00 VLMAE: Vision-Language Masked Autoencoder Aug 19, 2022 Image-text Retrieval Language Modeling
— Unverified 00 RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 00 RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Jun 4, 2025 Multimodal Reasoning Reasoning Segmentation
— Unverified 00