World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering Sep 30, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 05 You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding Feb 12, 2019 object-detection Object Detection
Code Code Available 05 MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs Jun 2, 2025 Instruction Following Text Generation
— Unverified 00 Visual Intention Grounding for Egocentric Assistants Apr 18, 2025 Object Visual Grounding
— Unverified 00 Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation Jan 28, 2017 Response Generation Retrieval
— Unverified 00 More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models May 23, 2025 Diagnostic Hallucination
— Unverified 00 Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Nov 15, 2024 Benchmarking counterfactual
— Unverified 00 Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining Apr 20, 2023 Visual Grounding
— Unverified 00 Image Difference Grounding with Natural Language Apr 2, 2025 Visual Grounding
— Unverified 00 Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search Jul 1, 2018 General Classification Image Retrieval
— Unverified 00 mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation May 29, 2025 Question Answering RAG
— Unverified 00 HPE-CogVLM: Advancing Vision Language Models with a Head Pose Grounding Task Jun 4, 2024 Head Pose Estimation Language Modelling
— Unverified 00 Visually grounded cross-lingual keyword spotting in speech Jun 13, 2018 Keyword Spotting Visual Grounding
— Unverified 00 HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model Jun 1, 2024 Action Recognition Activity Recognition
— Unverified 00 HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Jun 26, 2025 counterfactual Counterfactual Reasoning
— Unverified 00 Multi-Granularity Modularized Network for Abstract Visual Reasoning Jul 9, 2020 Visual Grounding Visual Reasoning
— Unverified 00 Visually Grounded Neural Syntax Acquisition Jun 7, 2019 Visual Grounding
— Unverified 00 Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 00 GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Jun 3, 2025 Visual Grounding
— Unverified 00 Multimodal Reference Visual Grounding Apr 2, 2025 Few-Shot Object Detection Visual Grounding
— Unverified 00 Multimodal Unified Attention Networks for Vision-and-Language Interactions Aug 12, 2019 Question Answering Visual Grounding
— Unverified 00 Multi-task Learning of Hierarchical Vision-Language Representation Dec 3, 2018 Multi-Task Learning Question Answering
— Unverified 00 GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance Oct 9, 2024 Visual Grounding
— Unverified 00 GroundFlow: A Plug-in Module for Temporal Reasoning on 3D Point Cloud Sequential Grounding Jun 26, 2025 3D visual grounding Large Language Model
— Unverified 00 NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar Aug 30, 2024 Autonomous Driving Visual Grounding
— Unverified 00 Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners Apr 30, 2024 3D visual grounding Visual Grounding
— Unverified 00 GroundCap: A Visually Grounded Image Captioning Dataset Feb 19, 2025 Image Captioning Object Detection
— Unverified 00 Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 00 Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics Oct 10, 2024 Visual Grounding
— Unverified 00 Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations Feb 2, 2024 Contrastive Learning Object
— Unverified 00 Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding Mar 8, 2025 Language Modeling Language Modelling
— Unverified 00 Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding Mar 19, 2020 Object Referring Expression Comprehension
— Unverified 00 A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding Jul 9, 2025 3D visual grounding Autonomous Navigation
— Unverified 00 Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment Mar 27, 2019 Image Retrieval Phrase Grounding
— Unverified 00 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving Mar 28, 2025 3D visual grounding Autonomous Driving
— Unverified 00 Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection Sep 18, 2023 3D Object Detection 3D Open-Vocabulary Object Detection
— Unverified 00 GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing Jan 12, 2025 Image Captioning Language Modeling
— Unverified 00 OG: Equip vision occupancy with instance segmentation and visual grounding Jul 12, 2023 Instance Segmentation Segmentation
— Unverified 00 OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Feb 27, 2024 Language Modeling Language Modelling
— Unverified 00 Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 00 GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning Jun 22, 2025 Answer Generation Decision Making
— Unverified 00 On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval Apr 24, 2019 Retrieval Visual Grounding
— Unverified 00 On the Role of Visual Grounding in VQA Jun 26, 2024 Visual Grounding Visual Question Answering (VQA)
— Unverified 00 GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting Dec 18, 2024 Scene Understanding Semantic Segmentation
— Unverified 00 Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models Jul 18, 2024 3D Semantic Segmentation Semantic Segmentation
— Unverified 00 OptiBox: Breaking the Limits of Proposals for Visual Grounding Nov 29, 2019 Image Captioning Visual Grounding
— Unverified 00 GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks Jan 1, 2023 Image Generation Image-text Retrieval
— Unverified 00 Overcoming Language Priors in Visual Question Answering with Adversarial Regularization Oct 8, 2018 Question Answering Visual Grounding
— Unverified 00 G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding Jan 1, 2024 3D visual grounding Visual Grounding
— Unverified 00 Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding Dec 1, 2024 Visual Grounding
— Unverified 00