Sample-Specific Debiasing for Better Image-Text Models Apr 25, 2023 Contrastive Learning Cross-Modal Retrieval
— Unverified 00 ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities Jul 1, 2024 3D visual grounding Language Modeling
— Unverified 00 Adversarial Testing for Visual Grounding via Image-Aware Property Reduction Mar 2, 2024 Visual Grounding
— Unverified 00 Scene-Intuitive Agent for Remote Embodied Visual Grounding Mar 24, 2021 cross-modal alignment Navigate
— Unverified 00 SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Jan 17, 2024 3D visual grounding Scene Understanding
— Unverified 00 SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling Feb 1, 2024 Diversity Image Captioning
— Unverified 00 Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs Jun 28, 2021 Question Answering Task 2
— Unverified 00 Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge Jul 5, 2024 Cross-Modal Retrieval Question Answering
— Unverified 00 SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Dec 5, 2024 3D visual grounding Object Localization
— Unverified 00 Emergent Communication with World Models Feb 22, 2020 Visual Grounding
— Unverified 00 Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes Mar 24, 2025 Cross-Modal Retrieval Disentanglement
— Unverified 00 Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes Jan 1, 2025 Cross-Modal Retrieval Disentanglement
— Unverified 00 Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge Feb 21, 2022 Grounded language learning Image Retrieval
— Unverified 00 Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding May 21, 2025 Visual Grounding
— Unverified 00 Efficient Multi-Modal Embeddings from Structured Data Oct 6, 2021 Semantic Similarity Semantic Textual Similarity
— Unverified 00 Efficient Adaptation For Remote Sensing Visual Grounding Mar 29, 2025 parameter-efficient fine-tuning Visual Grounding
— Unverified 00 EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments Jun 9, 2025 Benchmarking Navigate
— Unverified 00 EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues Dec 19, 2024 Change Detection Disaster Response
— Unverified 00 VQD: Visual Query Detection in Natural Scenes Apr 4, 2019 Referring Expression Referring Expression Comprehension
— Unverified 00 Semantic Localization Guiding Segment Anything Model For Reference Remote Sensing Image Segmentation Jun 12, 2025 Image Segmentation Segmentation
— Unverified 00 ACTRESS: Active Retraining for Semi-supervised Visual Grounding Jul 3, 2024 Binary Classification Visual Grounding
— Unverified 00 Semantic sentence similarity: size does not always matter Jun 16, 2021 Grounded language learning Image Retrieval
— Unverified 00 EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models Jan 6, 2025 Hallucination Visual Grounding
— Unverified 00 Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding Sep 28, 2022 Decoder Visual Grounding
— Unverified 00 Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 00 WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model Aug 30, 2023 Language Modeling Language Modelling
— Unverified 00 Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding Jun 13, 2024 3D visual grounding Attribute
— Unverified 00 DSM: Building A Diverse Semantic Map for 3D Visual Grounding Apr 11, 2025 3D visual grounding Scene Understanding
— Unverified 00 Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution May 24, 2022 Domain Adaptation Visual Grounding
— Unverified 00 Data-Efficient 3D Visual Grounding via Order-Aware Referring Mar 25, 2024 3D visual grounding Object
— Unverified 00 WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar Mar 19, 2024 Autonomous Navigation Referring Expression
— Unverified 00 Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation May 24, 2025 Mathematical Reasoning Multimodal Reasoning
— Unverified 00 Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment Dec 15, 2023 3D visual grounding Natural Language Queries
— Unverified 00 Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs Jun 5, 2025 cross-modal alignment Dense Captioning
— Unverified 00 Weakly-supervised segmentation of referring expressions May 10, 2022 Image Segmentation Referring Expression
— Unverified 00 Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement Oct 1, 2022 Graph Neural Network Object
— Unverified 00 Spatio-Temporal Graph for Video Captioning with Knowledge Distillation Mar 31, 2020 Knowledge Distillation Object
— Unverified 00 SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding Jun 27, 2025 3D visual grounding Natural Language Queries
— Unverified 00 Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe Sep 4, 2019 Disentanglement Visual Grounding
— Unverified 00 Structured Preference Optimization for Vision-Language Long-Horizon Task Planning Feb 28, 2025 Task Planning Visual Grounding
— Unverified 00 Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery Mar 22, 2024 Language Modeling Language Modelling
— Unverified 00 Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding Mar 10, 2022 Object Visual Grounding
— Unverified 00 Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe Jul 17, 2019 Disentanglement Visual Grounding
— Unverified 00 Detecting Concrete Visual Tokens for Multimodal Machine Translation Mar 5, 2024 Machine Translation Multimodal Machine Translation
— Unverified 00 DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding May 8, 2025 3D visual grounding cross-modal alignment
— Unverified 00 Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded Feb 11, 2019 Image Captioning Question Answering
— Unverified 00 Decoupled Spatial Temporal Graphs for Generic Visual Grounding Mar 18, 2021 Contrastive Learning Visual Grounding
— Unverified 00 Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding May 24, 2024 3D visual grounding Autonomous Driving
— Unverified 00 D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding Dec 2, 2021 3D dense captioning 3D visual grounding
— Unverified 00 Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding Jan 1, 2025 Referring Expression Referring Expression Comprehension
— Unverified 00