Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 1Improving One-stage Visual Grounding by Recursive Sub-query Construction Aug 3, 2020 Sentence Sentence Embedding
Code Code Available 1Multi3DRefer: Grounding Text Description to Multiple 3D Objects Sep 11, 2023 3D visual grounding Contrastive Learning
Code Code Available 1Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 1Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Jan 1, 2023 Descriptive Object
Code Code Available 1Multi-Modal Dynamic Graph Transformer for Visual Grounding Jan 1, 2022 Visual Grounding
Code Code Available 1PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 1Visual Grounding for Object-Level Generalization in Reinforcement Learning Aug 4, 2024 Language Modelling Object
Code Code Available 1Mono3DVG: 3D Visual Grounding in Monocular Images Dec 13, 2023 3D Object Detection 3D visual grounding
Code Code Available 1Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Nov 10, 2023 Diversity Multi-Task Learning
Code Code Available 1Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 1Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 1GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Mar 13, 2025 Multimodal Reasoning Question Answering
Code Code Available 1Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs Jan 11, 2025 Math Mathematical Problem-Solving
Code Code Available 1IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities Aug 23, 2024 Language Modeling Language Modelling
Code Code Available 1Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1PROGrasp: Pragmatic Human-Robot Communication for Object Grasping Sep 14, 2023 Object Object Discovery
Code Code Available 1CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding Apr 9, 2021 Descriptive Object
Code Code Available 1Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation Jun 11, 2024 Grounded Multimodal Named Entity Recognition named-entity-recognition
Code Code Available 1Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection Feb 3, 2025 3D visual grounding Visual Grounding
Code Code Available 1GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models Dec 6, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 1GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution Oct 1, 2022 coreference-resolution Coreference Resolution
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data Oct 28, 2023 3D visual grounding Autonomous Vehicles
Code Code Available 1Instruction-Guided Visual Masking May 30, 2024 Instruction Following Visual Grounding
Code Code Available 1REX: Reasoning-aware and Grounded Explanation Mar 11, 2022 Decision Making Explanation Generation
Code Code Available 1Grounded Situation Recognition with Transformers Nov 19, 2021 Decoder Grounded Situation Recognition
Code Code Available 1CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Sep 24, 2021 Visual Grounding
Code Code Available 1Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 1MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding Mar 5, 2024 3D visual grounding Decision Making
Code Code Available 1Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling Mar 21, 2024 Grounded language learning Language Acquisition
Code Code Available 1Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1Guessing State Tracking for Visual Dialogue Feb 24, 2020 Visual Grounding
Code Code Available 1CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 13D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection Apr 13, 2022 3D visual grounding Visual Grounding
Code Code Available 1EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Sep 29, 2022 3D visual grounding Object
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 1Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Nov 20, 2019 Graph Matching Graph Neural Network
Code Code Available 1Learning Point-Language Hierarchical Alignment for 3D Visual Grounding Oct 22, 2022 3D visual grounding Sentence
Code Code Available 1An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding Aug 2, 2024 Decoder Reasoning Segmentation
Code Code Available 1Spatially Aware Multimodal Transformers for TextVQA Jul 23, 2020 Optical Character Recognition (OCR) Spatial Reasoning
Code Code Available 1LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Feb 15, 2024 Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition
Code Code Available 1Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding Mar 29, 2022 Multimodal Reasoning Visual Grounding
Code Code Available 1