Task-oriented Sequential Grounding in 3D Scenes Aug 7, 2024 3D visual grounding Visual Grounding
— Unverified 00 Teaching Metric Distance to Autoregressive Multimodal Foundational Models Mar 4, 2025 Image Generation Visual Grounding
— Unverified 00 Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding Jun 21, 2022 Decoder Question Answering
— Unverified 00 D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding May 30, 2025 Diversity Pseudo Label
— Unverified 00 Cycle-Consistency Learning for Captioning and Grounding Dec 23, 2023 Image Captioning Visual Grounding
— Unverified 00 COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts Apr 14, 2025 Benchmarking Object
— Unverified 00 Countering Language Drift via Visual Grounding Sep 10, 2019 Language Modeling Language Modelling
— Unverified 00 The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA Jul 2, 2024 Grounded Video Question Answering Object Tracking
— Unverified 00 Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding Sep 8, 2023 3D Instance Segmentation 3D visual grounding
— Unverified 00 TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 00 Weakly-supervised Visual Grounding of Phrases with Linguistic Structures May 3, 2017 Sentence Visual Grounding
— Unverified 00 A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis Oct 31, 2023 Descriptive Medical Image Analysis
— Unverified 00 3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment May 3, 2025 Sentence Visual Grounding
— Unverified 00 Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases Jul 5, 2022 Object Representation Learning
— Unverified 00 When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 00 Towards Open-World Grasping with Large Vision-Language Models Jun 26, 2024 Robotic Grasping Visual Grounding
— Unverified 00 Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers Jan 3, 2024 Question Answering Visual Grounding
— Unverified 00 Zero-Shot 3D Visual Grounding from Vision-Language Models May 28, 2025 3D visual grounding Visual Grounding
— Unverified 00 CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition Jan 1, 2023 Sign Language Recognition Visual Grounding
— Unverified 00 Context-Aware Command Understanding for Tabletop Scenarios Oct 8, 2024 Decision Making Visual Grounding
— Unverified 00 Towards Visual Text Grounding of Multimodal Large Language Model Apr 7, 2025 Benchmarking Language Modeling
— Unverified 00 Training-Free Reasoning and Reflection in MLLMs May 22, 2025 Decoder Multimodal Reasoning
— Unverified 00 Transfer Learning from Audio-Visual Grounding to Speech Recognition Jul 9, 2019 speech-recognition Speech Recognition
— Unverified 00 Transformers in Vision: A Survey Jan 4, 2021 Action Recognition Activity Recognition
— Unverified 00 TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding Aug 5, 2021 3D visual grounding Relation
— Unverified 00 Compositional Temporal Visual Grounding of Natural Language Event Descriptions Dec 4, 2019 Visual Grounding
— Unverified 00 Commands 4 Autonomous Vehicles (C4AV) Workshop Summary Sep 18, 2020 Autonomous Vehicles Referring Expression Comprehension
— Unverified 00 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation Feb 11, 2025 Retrieval Vision and Language Navigation
— Unverified 00 TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding May 19, 2023 Sentence Visual Grounding
— Unverified 00 Class-agnostic Object Detection Nov 28, 2020 Benchmarking Class-agnostic Object Detection
— Unverified 00 Two Causally Related Needles in a Video Haystack May 26, 2025 Video Understanding Visual Grounding
— Unverified 00 Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding Jul 31, 2021 Decoder Sentence
— Unverified 00 Uni3DL: Unified Model for 3D and Language Understanding Dec 5, 2023 Cross-Modal Retrieval Instance Segmentation
— Unverified 00 Unified Representation Space for 3D Visual Grounding Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 00 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds Jan 1, 2021 Object Object Proposal Generation
— Unverified 00 CASTing Your Model: Learning to Localize Improves Self-Supervised Representations Dec 8, 2020 Self-Supervised Learning Visual Grounding
— Unverified 00 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Jan 1, 2022 3D dense captioning Attribute
— Unverified 00 Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation Dec 29, 2023 Visual Grounding
— Unverified 00 UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Dec 1, 2022 3D dense captioning 3D visual grounding
— Unverified 00 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning May 20, 2025 Large Language Model Multimodal Large Language Model
— Unverified 00 Unveiling and Mitigating Bias in Audio Visual Segmentation Jul 23, 2024 Attribute Visual Grounding
— Unverified 00 BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Apr 26, 2024 Game Design Image Generation
— Unverified 00 Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding Mar 25, 2025 Attribute Object
— Unverified 00 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation Dec 9, 2024 3D dense captioning 3D visual grounding
— Unverified 00 UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models Jul 25, 2024 Computational Efficiency Question Answering
— Unverified 00 Benchmarking Diverse-Modal Entity Linking with Generative Models May 27, 2023 Benchmarking Decoder
— Unverified 00 Using Multiple Instance Learning to Build Multimodal Representations Dec 11, 2022 Contrastive Learning Cross-Modal Retrieval
— Unverified 00 Being data-driven is not enough: Revisiting interactive instruction giving as a challenge for NLG Nov 1, 2018 Text Generation Visual Grounding
— Unverified 00 Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution Jun 18, 2022 Visual Grounding
— Unverified 00 Bayesian Self-Training for Semi-Supervised 3D Segmentation Sep 12, 2024 3D Instance Segmentation 3D Semantic Segmentation
— Unverified 00