Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses Dec 11, 2024 Image-text Retrieval Question Answering
— Unverified 00 VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Nov 7, 2024 Decoder Language Modeling
— Unverified 00 VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Jan 1, 2025 Large Language Model Video Segmentation
— Unverified 00 A Visual Tour Of Current Challenges In Multimodal Language Models Oct 22, 2022 Image Generation Text to Image Generation
— Unverified 00 VidLA: Video-Language Alignment at Scale Mar 21, 2024 Language Modelling Visual Grounding
— Unverified 00 Viewpoint-Aware Visual Grounding in 3D Scenes Jan 1, 2024 3D visual grounding Referring Expression
— Unverified 00 A Vision Centric Remote Sensing Benchmark Mar 20, 2025 Question Answering Representation Learning
— Unverified 00 ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding Jan 1, 2023 3D visual grounding Visual Grounding
— Unverified 00 ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition Jul 15, 2025 3D visual grounding Visual Grounding
— Unverified 00 ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding Jan 2, 2025 3D visual grounding Diagnostic
— Unverified 00 3D Scene Graph Guided Vision-Language Pre-training Nov 27, 2024 3D dense captioning 3D visual grounding
— Unverified 00 YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding Oct 10, 2022 Visual Grounding
— Unverified 00 AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring Jan 16, 2025 3D visual grounding Decoder
— Unverified 00 VIMI: Grounding Video Generation through Multi-modal Instruction Jul 8, 2024 Text-to-Video Generation Video Generation
— Unverified 00 Attention-Based Keyword Localisation in Speech using Visual Grounding Jun 16, 2021 Visual Grounding
— Unverified 00 Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding May 18, 2023 Contrastive Learning Object
— Unverified 00 VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? Apr 27, 2025 Visual Grounding Visual Storytelling
— Unverified 00 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Jul 25, 2023 3D visual grounding Object
— Unverified 00 Zero-Shot Visual Grounding of Referring Utterances in Dialogue Nov 16, 2021 Descriptive Visual Grounding
— Unverified 00 Visual Grounding Annotation of Recipe Flow Graph May 1, 2020 Visual Grounding
— Unverified 00 Learning from Synthetic Data for Visual Grounding Mar 20, 2024 Language Modelling Large Language Model
— Unverified 00 Visually Consistent Hierarchical Image Classification Jun 17, 2024 Classification image-classification
— Unverified 00 Learning Language Structures through Grounding Jun 14, 2024 Automatic Speech Recognition Dependency Parsing
— Unverified 00 Visual grounding for desktop graphical user interfaces May 5, 2024 Language Modeling Language Modelling
— Unverified 00 Learning to Compose and Reason with Language Tree Structures for Visual Grounding Jun 5, 2019 Visual Grounding Visual Reasoning
— Unverified 00 Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer Oct 16, 2021 Text Generation Visual Grounding
— Unverified 00 Learning to Ground VLMs without Forgetting Oct 14, 2024 Decoder Language Modelling
— Unverified 00 Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers Aug 1, 2021 Language Modeling Language Modelling
— Unverified 00 Learning Unsupervised Visual Grounding Through Semantic Self-Supervision Mar 17, 2018 Visual Grounding
— Unverified 00 Learning Visual Grounding from Generative Vision and Language Model Jul 18, 2024 Attribute Language Modeling
— Unverified 00 Learning with Difference Attention for Visually Grounded Self-supervised Representations Jun 26, 2023 Self-Supervised Learning Visual Grounding
— Unverified 00 How direct is the link between words and images? Jun 30, 2022 Visual Grounding Word Embeddings
— Unverified 00 Less is More: Generating Grounded Navigation Instructions from Landmarks Nov 25, 2021 Decoder Instruction Following
— Unverified 00 Visual Grounding of Inter-lingual Word-Embeddings Sep 8, 2022 Visual Grounding Word Embeddings
— Unverified 00 Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring Feb 16, 2025 Instance Segmentation Language Modeling
— Unverified 00 Leveraging Past References for Robust Language Grounding Nov 1, 2019 Object Referring Expression
— Unverified 00 A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 00 LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering Jan 29, 2024 Language Modeling Language Modelling
— Unverified 00 LanguageRefer: Spatial-Language Model for 3D Visual Grounding Jul 7, 2021 3D visual grounding Language Modeling
— Unverified 00 LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers Nov 7, 2024 3D visual grounding Autonomous Driving
— Unverified 00 Lightweight In-Context Tuning for Multimodal Unified Models Oct 8, 2023 Image Captioning In-Context Learning
— Unverified 00 Like a bilingual baby: The advantage of visually grounding a bilingual language model Oct 11, 2022 Language Modeling Language Modelling
— Unverified 00 Language learning using Speech to Image retrieval Sep 9, 2019 Grounded language learning Image Retrieval
— Unverified 00 Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving May 25, 2023 3D Object Detection Autonomous Driving
— Unverified 00 LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding May 27, 2024 Visual Grounding
— Unverified 00 Knowledge Supports Visual Language Grounding: A Case Study on Colour Terms Jul 1, 2020 Diagnostic Object
— Unverified 00 Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding Oct 21, 2024 3D visual grounding Object
— Unverified 00 I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs Jun 17, 2025 3D visual grounding Contrastive Learning
— Unverified 00 INVIGORATE: Interactive Visual Grounding and Grasping in Clutter Aug 25, 2021 Blocking Object
— Unverified 00 LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation Jan 1, 2024 Image Segmentation Semantic Segmentation
— Unverified 00