Semantic sentence similarity: size does not always matter Jun 16, 2021 Grounded language learning Image Retrieval
— Unverified 0Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution May 24, 2022 Domain Adaptation Visual Grounding
— Unverified 0Spatio-Temporal Graph for Video Captioning with Knowledge Distillation Mar 31, 2020 Knowledge Distillation Object
— Unverified 0SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding Jun 27, 2025 3D visual grounding Natural Language Queries
— Unverified 0Structured Preference Optimization for Vision-Language Long-Horizon Task Planning Feb 28, 2025 Task Planning Visual Grounding
— Unverified 0Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery Mar 22, 2024 Language Modeling Language Modelling
— Unverified 0Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding Mar 10, 2022 Object Visual Grounding
— Unverified 0Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded Feb 11, 2019 Image Captioning Question Answering
— Unverified 0Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding May 24, 2024 3D visual grounding Autonomous Driving
— Unverified 0Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding Jan 1, 2025 Referring Expression Referring Expression Comprehension
— Unverified 0Task-oriented Sequential Grounding in 3D Scenes Aug 7, 2024 3D visual grounding Visual Grounding
— Unverified 0Teaching Metric Distance to Autoregressive Multimodal Foundational Models Mar 4, 2025 Image Generation Visual Grounding
— Unverified 0Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding Jun 21, 2022 Decoder Question Answering
— Unverified 0The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA Jul 2, 2024 Grounded Video Question Answering Object Tracking
— Unverified 0Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding Sep 8, 2023 3D Instance Segmentation 3D visual grounding
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases Jul 5, 2022 Object Representation Learning
— Unverified 0Towards Open-World Grasping with Large Vision-Language Models Jun 26, 2024 Robotic Grasping Visual Grounding
— Unverified 0Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers Jan 3, 2024 Question Answering Visual Grounding
— Unverified 0Towards Visual Text Grounding of Multimodal Large Language Model Apr 7, 2025 Benchmarking Language Modeling
— Unverified 0Training-Free Reasoning and Reflection in MLLMs May 22, 2025 Decoder Multimodal Reasoning
— Unverified 0Transfer Learning from Audio-Visual Grounding to Speech Recognition Jul 9, 2019 speech-recognition Speech Recognition
— Unverified 0Transformers in Vision: A Survey Jan 4, 2021 Action Recognition Activity Recognition
— Unverified 0TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding Aug 5, 2021 3D visual grounding Relation
— Unverified 0TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation Feb 11, 2025 Retrieval Vision and Language Navigation
— Unverified 0TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding May 19, 2023 Sentence Visual Grounding
— Unverified 0Two Causally Related Needles in a Video Haystack May 26, 2025 Video Understanding Visual Grounding
— Unverified 0Uni3DL: Unified Model for 3D and Language Understanding Dec 5, 2023 Cross-Modal Retrieval Instance Segmentation
— Unverified 0Unified Representation Space for 3D Visual Grounding Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 0UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Dec 1, 2022 3D dense captioning 3D visual grounding
— Unverified 0UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning May 20, 2025 Large Language Model Multimodal Large Language Model
— Unverified 0Unveiling and Mitigating Bias in Audio Visual Segmentation Jul 23, 2024 Attribute Visual Grounding
— Unverified 0UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models Jul 25, 2024 Computational Efficiency Question Answering
— Unverified 0Using Multiple Instance Learning to Build Multimodal Representations Dec 11, 2022 Contrastive Learning Cross-Modal Retrieval
— Unverified 0VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Nov 7, 2024 Decoder Language Modeling
— Unverified 0VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Jan 1, 2025 Large Language Model Video Segmentation
— Unverified 0VidLA: Video-Language Alignment at Scale Mar 21, 2024 Language Modelling Visual Grounding
— Unverified 0Viewpoint-Aware Visual Grounding in 3D Scenes Jan 1, 2024 3D visual grounding Referring Expression
— Unverified 0ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding Jan 1, 2023 3D visual grounding Visual Grounding
— Unverified 0ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition Jul 15, 2025 3D visual grounding Visual Grounding
— Unverified 0ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding Jan 2, 2025 3D visual grounding Diagnostic
— Unverified 0VIMI: Grounding Video Generation through Multi-modal Instruction Jul 8, 2024 Text-to-Video Generation Video Generation
— Unverified 0Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding May 18, 2023 Contrastive Learning Object
— Unverified 0VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? Apr 27, 2025 Visual Grounding Visual Storytelling
— Unverified 0Visual Grounding Annotation of Recipe Flow Graph May 1, 2020 Visual Grounding
— Unverified 0Visual grounding for desktop graphical user interfaces May 5, 2024 Language Modeling Language Modelling
— Unverified 0How direct is the link between words and images? Jun 30, 2022 Visual Grounding Word Embeddings
— Unverified 0Visual Grounding of Inter-lingual Word-Embeddings Sep 8, 2022 Visual Grounding Word Embeddings
— Unverified 0Visual Grounding of Whole Radiology Reports for 3D CT Images Dec 8, 2023 Segmentation Visual Grounding
— Unverified 0Visual Grounding Strategies for Text-Only Natural Language Processing Mar 25, 2021 Image Retrieval Language Modeling
— Unverified 0