Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 1Multi-branch Collaborative Learning Network for 3D Visual Grounding Jul 7, 2024 3D visual grounding Referring Expression
Code Code Available 1Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 1Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Jan 1, 2023 Descriptive Object
Code Code Available 1Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation Sep 17, 2021 Dialogue Generation Visual Grounding
Code Code Available 1PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 1Visual Grounding for Object-Level Generalization in Reinforcement Learning Aug 4, 2024 Language Modelling Object
Code Code Available 1GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Mar 19, 2024 Reinforcement Learning (RL) Visual Grounding
Code Code Available 1mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 13D Vision and Language Pretraining with Large-Scale Synthetic Data Jul 8, 2024 Dense Captioning Diversity
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1Context Disentangling and Prototype Inheriting for Robust Visual Grounding Dec 19, 2023 Visual Grounding
Code Code Available 1CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 1Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 1Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding Mar 5, 2024 3D visual grounding Decision Making
Code Code Available 1Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation Jun 11, 2024 Grounded Multimodal Named Entity Recognition named-entity-recognition
Code Code Available 1Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 1Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection Feb 3, 2025 3D visual grounding Visual Grounding
Code Code Available 1Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 1CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data Oct 28, 2023 3D visual grounding Autonomous Vehicles
Code Code Available 1GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models Dec 6, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 1GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution Oct 1, 2022 coreference-resolution Coreference Resolution
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding Apr 9, 2021 Descriptive Object
Code Code Available 1REX: Reasoning-aware and Grounded Explanation Mar 11, 2022 Decision Making Explanation Generation
Code Code Available 1SAT: 2D Semantics Assisted Training for 3D Visual Grounding May 24, 2021 3D visual grounding Object
Code Code Available 1Grounded Situation Recognition with Transformers Nov 19, 2021 Decoder Grounded Situation Recognition
Code Code Available 1CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Sep 24, 2021 Visual Grounding
Code Code Available 1Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling Mar 21, 2024 Grounded language learning Language Acquisition
Code Code Available 1Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 13D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection Apr 13, 2022 3D visual grounding Visual Grounding
Code Code Available 1EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Sep 29, 2022 3D visual grounding Object
Code Code Available 1Guessing State Tracking for Visual Dialogue Feb 24, 2020 Visual Grounding
Code Code Available 1CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Feb 13, 2020 Graph Matching Graph Neural Network
Code Code Available 1Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Nov 20, 2019 Graph Matching Graph Neural Network
Code Code Available 1LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Feb 15, 2024 Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition
Code Code Available 1Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation Apr 5, 2021 Object Visual Grounding
Code Code Available 1Learning Point-Language Hierarchical Alignment for 3D Visual Grounding Oct 22, 2022 3D visual grounding Sentence
Code Code Available 1An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding Aug 2, 2024 Decoder Reasoning Segmentation
Code Code Available 1STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection Apr 3, 2025 Instruction Following Language Modeling
Code Code Available 1OCID-Ref: A 3D Robotic Dataset with Embodied Language for Clutter Scene Grounding Mar 13, 2021 Referring Expression Referring Expression Segmentation
Code Code Available 1Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding Mar 29, 2022 Multimodal Reasoning Visual Grounding
Code Code Available 1