Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 15 GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 15 Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Oct 9, 2023 Language Modelling Question Answering
Code Code Available 15 Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 15 Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 15 REX: Reasoning-aware and Grounded Explanation Mar 11, 2022 Decision Making Explanation Generation
Code Code Available 15 PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 15 Visual Grounding for Object-Level Generalization in Reinforcement Learning Aug 4, 2024 Language Modelling Object
Code Code Available 15 Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images Mar 14, 2021 3D visual grounding Object
Code Code Available 15 Relation-aware Instance Refinement for Weakly Supervised Visual Grounding Mar 24, 2021 Object Relation
Code Code Available 15 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Nov 10, 2023 Diversity Multi-Task Learning
Code Code Available 15 Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 15 Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation Jul 3, 2020 Contrastive Learning Knowledge Distillation
Code Code Available 15 CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 15 Context Disentangling and Prototype Inheriting for Robust Visual Grounding Dec 19, 2023 Visual Grounding
Code Code Available 15 CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 15 InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 15 Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation Jun 11, 2024 Grounded Multimodal Named Entity Recognition named-entity-recognition
Code Code Available 15 Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems Nov 21, 2024 3D visual grounding Negation
Code Code Available 15 Learning Cross-modal Context Graph for Visual Grounding Nov 20, 2019 Graph Matching Graph Neural Network
Code Code Available 15 SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding Feb 24, 2025 cross-modal alignment Visual Grounding
Code Code Available 15 Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection Feb 3, 2025 3D visual grounding Visual Grounding
Code Code Available 15 Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 15 Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension May 21, 2024 3D visual grounding Referring Expression
Code Code Available 15 CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data Oct 28, 2023 3D visual grounding Autonomous Vehicles
Code Code Available 15 Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 15 GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models Dec 6, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 15 GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution Oct 1, 2022 coreference-resolution Coreference Resolution
Code Code Available 15 InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring Mar 1, 2021 3D visual grounding Attribute
Code Code Available 15 HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Mar 19, 2024 Reinforcement Learning (RL) Visual Grounding
Code Code Available 15 Instruction-Following Agents with Multimodal Transformer Oct 24, 2022 Instruction Following Visual Grounding
Code Code Available 15 Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 15 Grounded Situation Recognition with Transformers Nov 19, 2021 Decoder Grounded Situation Recognition
Code Code Available 15 CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Sep 24, 2021 Visual Grounding
Code Code Available 15 Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans May 23, 2023 3D Reconstruction 3D visual grounding
Code Code Available 15 Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 15 GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection Dec 22, 2023 Attribute object-detection
Code Code Available 15 IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities Aug 23, 2024 Language Modeling Language Modelling
Code Code Available 15 Guessing State Tracking for Visual Dialogue Feb 24, 2020 Visual Grounding
Code Code Available 15 CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 15 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 15 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection Apr 13, 2022 3D visual grounding Visual Grounding
Code Code Available 15 EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Sep 29, 2022 3D visual grounding Object
Code Code Available 15 How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game Mar 13, 2025 Multimodal Reasoning Question Answering
Code Code Available 15 Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation Apr 5, 2021 Object Visual Grounding
Code Code Available 15 Learning Point-Language Hierarchical Alignment for 3D Visual Grounding Oct 22, 2022 3D visual grounding Sentence
Code Code Available 15 Improving One-stage Visual Grounding by Recursive Sub-query Construction Aug 3, 2020 Sentence Sentence Embedding
Code Code Available 15 Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding Mar 16, 2022 Language Modelling Natural Language Queries
Code Code Available 15 RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning Mar 29, 2025 Chart Question Answering Chart Understanding
Code Code Available 15 Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling Feb 7, 2022 Language Modeling Language Modelling
Code Code Available 15