Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation May 24, 2021 Referring Expression Referring Expression Comprehension
Code Code Available 05 Learning semantic sentence representations from visually grounded language without lexical knowledge Mar 27, 2019 Grounded language learning Learning Semantic Representations
Code Code Available 05 Learning to ground medical text in a 3D human atlas Nov 1, 2020 Phrase Grounding Visual Grounding
Code Code Available 05 Learning Two-Branch Neural Networks for Image-Text Matching Tasks Apr 11, 2017 Image-text matching Retrieval
Code Code Available 05 LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation Mar 18, 2025 Decoder Object
Code Code Available 05 Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations Jun 4, 2023 Visual Grounding Word Embeddings
Code Code Available 05 Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI May 9, 2025 4k Domain Generalization
Code Code Available 05 M^3D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction Dec 5, 2024 Relation Extraction Visual Grounding
Code Code Available 05 MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing Mar 31, 2025 Object object-detection
Code Code Available 05 MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs Oct 16, 2024 Visual Grounding
Code Code Available 05 Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 05 Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment Dec 5, 2023 Explanation Generation Visual Grounding
Code Code Available 05 Modularized Textual Grounding for Counterfactual Resilience Apr 7, 2019 Attribute counterfactual
Code Code Available 05 Multi-Attribute Interactions Matter for 3D Visual Grounding Jan 1, 2024 3D visual grounding Attribute
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 Neural Twins Talk Sep 26, 2020 Image Captioning Sentence
Code Code Available 05 NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning Oct 17, 2023 Segmentation Visual Grounding
Code Code Available 05 Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition Jul 5, 2024 Visual Grounding Visual Storytelling
Code Code Available 05 Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding Oct 31, 2024 Object Position
Code Code Available 05 Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models Dec 11, 2024 Question Answering Visual Grounding
Code Code Available 05 ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding Aug 29, 2024 Data Augmentation Image Generation
Code Code Available 05 Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization Apr 17, 2024 3D dense captioning 3D visual grounding
Code Code Available 05 Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding May 9, 2018 Diversity Phrase Grounding
Code Code Available 05 Revisiting Visual Question Answering Baselines Jun 27, 2016 Binary Classification Multiple-choice
Code Code Available 05 RoViST:Learning Robust Metrics for Visual Storytelling May 8, 2022 Sentence Text Generation
Code Code Available 05 RoViST: Learning Robust Metrics for Visual Storytelling Jul 1, 2022 Sentence Text Generation
Code Code Available 05 ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding Mar 23, 2023 3D visual grounding Visual Grounding
Code Code Available 05 SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention Mar 13, 2024 3D visual grounding cross-modal alignment
Code Code Available 05 Self-view Grounding Given a Narrated 360° Video Nov 23, 2017 Sentence Visual Grounding
Code Code Available 05 Semantic query-by-example speech search using visual grounding Apr 15, 2019 Retrieval Semantic Retrieval
Code Code Available 05 Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling Sep 9, 2024 Language Modeling Language Modelling
Code Code Available 05 SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding Jul 27, 2022 Visual Grounding
Code Code Available 05 Smart Vision-Language Reasoners Jul 5, 2024 Math Mathematical Reasoning
Code Code Available 05 SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency Oct 20, 2020 Question Answering Visual Grounding
Code Code Available 05 To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo Mar 30, 2022 Benchmarking Person-centric Visual Grounding
Code Code Available 05 To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo May 1, 2022 Benchmarking Person-centric Visual Grounding
Code Code Available 05 Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 05 Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency Jan 1, 2024 3D visual grounding Relation
Code Code Available 05 Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Apr 2, 2025 Descriptive Large Language Model
Code Code Available 05 Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 05 OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Feb 7, 2022 Image Captioning image-classification
Code Code Available 05 UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings May 17, 2025 Image to text Information Retrieval
Code Code Available 05 Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model May 26, 2025 Diagnostic Reinforcement Learning (RL)
Code Code Available 05 ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling Feb 9, 2024 Hallucination Natural Language Understanding
Code Code Available 05 Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models Nov 21, 2023 Image Segmentation Language Modelling
Code Code Available 05 Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset Nov 21, 2024 Question Answering Visual Grounding
Code Code Available 05 Visual Coreference Resolution in Visual Dialog using Neural Module Networks Sep 6, 2018 Common Sense Reasoning coreference-resolution
Code Code Available 05 Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 05 Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes Nov 22, 2015 Common Sense Reasoning Image Retrieval
Code Code Available 05 WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language Apr 12, 2023 3D visual grounding Autonomous Driving
Code Code Available 05