Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners Apr 30, 2024 3D visual grounding Visual Grounding
— Unverified 0Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics Oct 10, 2024 Visual Grounding
— Unverified 0Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations Feb 2, 2024 Contrastive Learning Object
— Unverified 0NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Mar 28, 2025 3D visual grounding Autonomous Driving
— Unverified 0Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection Sep 18, 2023 3D Object Detection 3D Open-Vocabulary Object Detection
— Unverified 0OG: Equip vision occupancy with instance segmentation and visual grounding Jul 12, 2023 Instance Segmentation Segmentation
— Unverified 0OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Feb 27, 2024 Language Modeling Language Modelling
— Unverified 0Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval Apr 24, 2019 Retrieval Visual Grounding
— Unverified 0On the Role of Visual Grounding in VQA Jun 26, 2024 Visual Grounding Visual Question Answering (VQA)
— Unverified 0Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Jul 18, 2024 3D Semantic Segmentation Semantic Segmentation
— Unverified 0OptiBox: Breaking the Limits of Proposals for Visual Grounding Nov 29, 2019 Image Captioning Visual Grounding
— Unverified 0Overcoming Language Priors in Visual Question Answering with Adversarial Regularization Oct 8, 2018 Question Answering Visual Grounding
— Unverified 0Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding Dec 1, 2024 Visual Grounding
— Unverified 0Parallel Vertex Diffusion for Unified Visual Grounding Mar 13, 2023 Visual Grounding
— Unverified 0Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding Oct 31, 2024 parameter-efficient fine-tuning Visual Grounding
— Unverified 0PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding Jul 19, 2024 3D visual grounding Attribute
— Unverified 0Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Jun 5, 2025 Math Visual Grounding
— Unverified 0Context-Aware Indoor Point Cloud Object Generation through User Instructions Nov 26, 2023 Position Visual Grounding
— Unverified 0Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models Aug 15, 2024 Pose Estimation Visual Grounding
— Unverified 0Programming with Pixels: Computer-Use Meets Software Engineering Feb 24, 2025 Visual Grounding
— Unverified 0Propagating Over Phrase Relations for One-Stage Visual Grounding Aug 1, 2020 Phrase Grounding Relational Reasoning
— Unverified 0ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding Feb 26, 2025 3D visual grounding Visual Grounding
— Unverified 0ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning Mar 30, 2025 3D visual grounding Feature Splatting
— Unverified 0Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics May 22, 2025 Image Captioning text similarity
— Unverified 0Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder Jul 13, 2020 Question Answering Visual Grounding
— Unverified 0ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations Jan 24, 2025 Decoder Object
— Unverified 0Referencing Where to Focus: Improving VisualGrounding with Referential Query Dec 26, 2024 Decoder Visual Grounding
— Unverified 0Joint Visual Grounding with Language Scene Graphs Jun 9, 2019 Referring Expression Visual Grounding
— Unverified 0Referring to Screen Texts with Voice Assistants Jun 10, 2023 Navigate Visual Grounding
— Unverified 0Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models Sep 8, 2021 Concept-To-Text Generation Specificity
— Unverified 0Revisiting Data Auditing in Large Vision-Language Models Apr 25, 2025 Visual Grounding
— Unverified 0Revisiting Visual Grounding Apr 3, 2019 Image Retrieval Retrieval
— Unverified 0Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation Mar 14, 2024 Decision Making Language Modeling
— Unverified 0Extending CLIP's Image-Text Alignment to Referring Image Segmentation Jun 14, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 0RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception Jan 31, 2025 Reinforcement Learning (RL) Spatial Reasoning
— Unverified 0RoViST: Learning Robust Metrics for Visual Storytelling Dec 17, 2021 Sentence Text Generation
— Unverified 0RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought Jun 4, 2025 Multimodal Reasoning Reasoning Segmentation
— Unverified 0Sample-Specific Debiasing for Better Image-Text Models Apr 25, 2023 Contrastive Learning Cross-Modal Retrieval
— Unverified 0Scene-Intuitive Agent for Remote Embodied Visual Grounding Mar 24, 2021 cross-modal alignment Navigate
— Unverified 0SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Jan 17, 2024 3D visual grounding Scene Understanding
— Unverified 0SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling Feb 1, 2024 Diversity Image Captioning
— Unverified 0Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge Jul 5, 2024 Cross-Modal Retrieval Question Answering
— Unverified 0SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding Dec 5, 2024 3D visual grounding Object Localization
— Unverified 0Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes Mar 24, 2025 Cross-Modal Retrieval Disentanglement
— Unverified 0Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes Jan 1, 2025 Cross-Modal Retrieval Disentanglement
— Unverified 0Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge Feb 21, 2022 Grounded language learning Image Retrieval
— Unverified 0Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding May 21, 2025 Visual Grounding
— Unverified 0Semantic Localization Guiding Segment Anything Model For Reference Remote Sensing Image Segmentation Jun 12, 2025 Image Segmentation Segmentation
— Unverified 0