Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe Sep 4, 2019 Disentanglement Visual Grounding
— Unverified 0Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement Oct 1, 2022 Graph Neural Network Object
— Unverified 0Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 0Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation May 24, 2025 Mathematical Reasoning Multimodal Reasoning
— Unverified 0Data-Efficient 3D Visual Grounding via Order-Aware Referring Mar 25, 2024 3D visual grounding Object
— Unverified 0DSM: Building A Diverse Semantic Map for 3D Visual Grounding Apr 11, 2025 3D visual grounding Scene Understanding
— Unverified 0Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding Jun 13, 2024 3D visual grounding Attribute
— Unverified 0Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 0Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding Sep 28, 2022 Decoder Visual Grounding
— Unverified 0EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models Jan 6, 2025 Hallucination Visual Grounding
— Unverified 0EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues Dec 19, 2024 Change Detection Disaster Response
— Unverified 0EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments Jun 9, 2025 Benchmarking Navigate
— Unverified 0Efficient Adaptation For Remote Sensing Visual Grounding Mar 29, 2025 parameter-efficient fine-tuning Visual Grounding
— Unverified 0Efficient Multi-Modal Embeddings from Structured Data Oct 6, 2021 Semantic Similarity Semantic Textual Similarity
— Unverified 0Emergent Communication with World Models Feb 22, 2020 Visual Grounding
— Unverified 0ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities Jul 1, 2024 3D visual grounding Language Modeling
— Unverified 0Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Mar 5, 2025 Anomaly Detection Visual Grounding
— Unverified 0Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment Dec 4, 2023 Grounded language learning Language Modeling
— Unverified 0Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Dec 6, 2024 document understanding Hallucination
— Unverified 0Learning to Assemble Neural Module Tree Networks for Visual Grounding Dec 8, 2018 Dependency Parsing Natural Language Visual Grounding
— Unverified 0Explainable Video Entailment With Grounded Visual Evidence Jan 1, 2021 Visual Grounding
— Unverified 0FACET: Fairness in Computer Vision Evaluation Benchmark Aug 31, 2023 Fairness image-classification
— Unverified 0Fast visual grounding in interaction: bringing few-shot learning with neural networks to an interactive robot Jun 1, 2020 Few-Shot Learning Transfer Learning
— Unverified 0Few-Shot Visual Grounding for Natural Human-Robot Interaction Mar 17, 2021 Visual Grounding
— Unverified 0Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos Jun 1, 2018 Multiple Instance Learning Sentence
— Unverified 0FindIt: Generalized Localization with Natural Language Queries Mar 31, 2022 Natural Language Queries Object
— Unverified 0Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding Nov 5, 2024 3D visual grounding Visual Grounding
— Unverified 0FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis Jan 17, 2025 Bayesian Inference Language Modeling
— Unverified 0FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts Jun 27, 2024 Decision Making Logical Reasoning
— Unverified 0Focusing On Targets For Improving Weakly Supervised Visual Grounding Feb 22, 2023 Dependency Parsing Object
— Unverified 0From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models Jun 28, 2024 Diversity Retrieval
— Unverified 0From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes Jun 5, 2025 3D visual grounding Object
— Unverified 0G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding Jan 1, 2024 3D visual grounding Visual Grounding
— Unverified 0GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks Jan 1, 2023 Image Generation Image-text Retrieval
— Unverified 0GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting Dec 18, 2024 Scene Understanding Semantic Segmentation
— Unverified 0GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning Jun 22, 2025 Answer Generation Decision Making
— Unverified 0GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing Jan 12, 2025 Image Captioning Language Modeling
— Unverified 0Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding Mar 19, 2020 Object Referring Expression Comprehension
— Unverified 0Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0GroundCap: A Visually Grounded Image Captioning Dataset Feb 19, 2025 Image Captioning Object Detection
— Unverified 0GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding Jun 26, 2025 3D visual grounding Large Language Model
— Unverified 0GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance Oct 9, 2024 Visual Grounding
— Unverified 0GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Jun 3, 2025 Visual Grounding
— Unverified 0Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 0HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Jun 26, 2025 counterfactual Counterfactual Reasoning
— Unverified 0HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model Jun 1, 2024 Action Recognition Activity Recognition
— Unverified 0HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task Jun 4, 2024 Head Pose Estimation Language Modelling
— Unverified 0Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search Jul 1, 2018 General Classification Image Retrieval
— Unverified 0Image Difference Grounding with Natural Language Apr 2, 2025 Visual Grounding
— Unverified 0Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation Jan 28, 2017 Response Generation Retrieval
— Unverified 0