Visual Grounding Helps Learn Word Meanings in Low-Data Regimes Oct 20, 2023 Image Captioning Language Acquisition
Code Code Available 15 How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Mar 13, 2025 Multimodal Reasoning Question Answering
Code Code Available 15 Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Mar 16, 2022 Language Modelling Natural Language Queries
Code Code Available 15 Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding Mar 29, 2022 Multimodal Reasoning Visual Grounding
Code Code Available 15 Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation May 24, 2021 Referring Expression Referring Expression Comprehension
Code Code Available 05 Smart Vision-Language Reasoners Jul 5, 2024 Math Mathematical Reasoning
Code Code Available 05 SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency Oct 20, 2020 Question Answering Visual Grounding
Code Code Available 05 Dual Attention Networks for Visual Reference Resolution in Visual Dialog Feb 25, 2019 AI Agent Question Answering
Code Code Available 05 Language learning using Speech to Image retrieval Sep 9, 2019 Grounded language learning Image Retrieval
Code Code Available 05 Language-Guided Diffusion Model for Visual Grounding Aug 18, 2023 cross-modal alignment Denoising
Code Code Available 05 DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 05 Language Adaptive Weight Generation for Multi-task Visual Grounding Jun 6, 2023 Referring Expression Referring Expression Comprehension
Code Code Available 05 SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding Jul 27, 2022 Visual Grounding
Code Code Available 05 Semantic sentence similarity: size does not always matter Jun 16, 2021 Grounded language learning Image Retrieval
Code Code Available 05 Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat Sep 10, 2018 Multi-Task Learning Reinforcement Learning
Code Code Available 05 Semantic query-by-example speech search using visual grounding Apr 15, 2019 Retrieval Semantic Retrieval
Code Code Available 05 InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions Oct 18, 2023 Benchmarking Visual Grounding
Code Code Available 05 Self-view Grounding Given a Narrated 360° Video Nov 23, 2017 Sentence Visual Grounding
Code Code Available 05 Introspective Learning : A Two-Stage Approach for Inference in Neural Networks Sep 17, 2022 Active Learning Decision Making
Code Code Available 05 Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering Sep 13, 2021 Data Augmentation Question Answering
Code Code Available 05 SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention Mar 13, 2024 3D visual grounding cross-modal alignment
Code Code Available 05 ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding Mar 23, 2023 3D visual grounding Visual Grounding
Code Code Available 05 RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
Code Code Available 05 Beyond Human Perception: Understanding Multi-Object World from Monocular View Jan 1, 2025 3D visual grounding Denoising
Code Code Available 05 Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge Feb 21, 2022 Grounded language learning Image Retrieval
Code Code Available 05 Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Sep 9, 2024 Language Modeling Language Modelling
Code Code Available 05 DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners Sep 7, 2023 Diagnostic Visual Grounding
Code Code Available 05 A Better Loss for Visual-Textual Grounding Aug 11, 2021 Sentence Visual Grounding
Code Code Available 05 Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding May 9, 2018 Diversity Phrase Grounding
Code Code Available 05 Investigating Compositional Challenges in Vision-Language Models for Visual Grounding Jan 1, 2024 Attribute Relation
Code Code Available 05 Revisiting Visual Question Answering Baselines Jun 27, 2016 Binary Classification Multiple-choice
Code Code Available 05 Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models Dec 3, 2023 Hallucination Visual Grounding
Code Code Available 05 ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding Aug 29, 2024 Data Augmentation Image Generation
Code Code Available 05 Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 05 Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization Apr 17, 2024 3D dense captioning 3D visual grounding
Code Code Available 05 RoViST:Learning Robust Metrics for Visual Storytelling May 8, 2022 Sentence Text Generation
Code Code Available 05 HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks Aug 24, 2023 Language Modeling Language Modelling
Code Code Available 05 Deconfounded Visual Grounding Dec 31, 2021 Referring Expression Visual Grounding
Code Code Available 05 Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models Dec 11, 2024 Question Answering Visual Grounding
Code Code Available 05 HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models Sep 16, 2024 Attribute Decoder
Code Code Available 05 Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding Oct 31, 2024 Object Position
Code Code Available 05 GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation Jul 12, 2023 Lifelong learning Object Detection
Code Code Available 05 CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays May 23, 2025 Diagnostic Question Answering
Code Code Available 05 RoViST: Learning Robust Metrics for Visual Storytelling Jul 1, 2022 Sentence Text Generation
Code Code Available 05 Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition Jul 5, 2024 Visual Grounding Visual Storytelling
Code Code Available 05 Language with Vision: a Study on Grounded Word and Sentence Embeddings Jun 17, 2022 Sentence Sentence Embeddings
Code Code Available 05 Grounding of Textual Phrases in Images by Reconstruction Nov 12, 2015 Language Modeling Language Modelling
Code Code Available 05 NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning Oct 17, 2023 Segmentation Visual Grounding
Code Code Available 05 GROOViST: A Metric for Grounding Objects in Visual Storytelling Oct 26, 2023 Visual Grounding Visual Storytelling
Code Code Available 05 Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
Code Code Available 05