Multi-branch Collaborative Learning Network for 3D Visual Grounding Jul 7, 2024 3D visual grounding Referring Expression
Code Code Available 1Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation Jul 1, 2024 Image-text Retrieval Question Answering
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1A Unified Framework for 3D Point Cloud Visual Grounding Aug 23, 2023 CPU GPU
Code Code Available 1Multi-View Transformer for 3D Visual Grounding Apr 5, 2022 3D visual grounding Visual Grounding
Code Code Available 1Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans May 23, 2023 3D Reconstruction 3D visual grounding
Code Code Available 1CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models Sep 24, 2021 Visual Grounding
Code Code Available 1A Fast and Accurate One-Stage Approach to Visual Grounding Aug 18, 2019 Referring Expression Referring Expression Comprehension
Code Code Available 1MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Apr 26, 2021 Generalized Referring Expression Comprehension Phrase Grounding
Code Code Available 1MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding Mar 5, 2024 3D visual grounding Decision Making
Code Code Available 1CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding Oct 10, 2023 3D visual grounding Visual Grounding
Code Code Available 1Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions Feb 17, 2024 Visual Grounding
Code Code Available 1Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Jul 18, 2023 3D visual grounding Object
Code Code Available 1Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1MixGen: A New Multi-Modal Data Augmentation Jun 16, 2022 Data Augmentation Image-text Retrieval
Code Code Available 1LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Feb 15, 2024 Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition
Code Code Available 1Local-Global Context Aware Transformer for Language-Guided Video Segmentation Mar 18, 2022 Referring Expression Segmentation Referring Video Object Segmentation
Code Code Available 1Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling Mar 21, 2024 Grounded language learning Language Acquisition
Code Code Available 1Visual Grounding Methods for VQA are Working for the Wrong Reasons! Apr 12, 2020 Question Answering Visual Grounding
Code Code Available 1Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 1Context Disentangling and Prototype Inheriting for Robust Visual Grounding Dec 19, 2023 Visual Grounding
Code Code Available 1Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 13D Vision and Language Pretraining with Large-Scale Synthetic Data Jul 8, 2024 Dense Captioning Diversity
Code Code Available 1SAT: 2D Semantics Assisted Training for 3D Visual Grounding May 24, 2021 3D visual grounding Object
Code Code Available 1Learning Cross-modal Context Graph for Visual Grounding Nov 20, 2019 Graph Matching Graph Neural Network
Code Code Available 1Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 1Connecting What to Say With Where to Look by Modeling Human Attention Traces May 12, 2021 Caption Generation Image Captioning
Code Code Available 1Look Before You Leap: Learning Landmark Features for One-Stage Visual Grounding Apr 9, 2021 Descriptive Object
Code Code Available 1Visual Grounding for Object-Level Generalization in Reinforcement Learning Aug 4, 2024 Language Modelling Object
Code Code Available 1Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Jan 1, 2023 Descriptive Object
Code Code Available 1Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision Jul 23, 2023 Decoder Visual Grounding
Code Code Available 1Joint Visual Grounding and Tracking with Natural Language Specification Mar 21, 2023 Visual Grounding Visual Tracking
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Instruction-Guided Visual Masking May 30, 2024 Instruction Following Visual Grounding
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1Instruction-Following Agents with Multimodal Transformer Oct 24, 2022 Instruction Following Visual Grounding
Code Code Available 1Collaborative Transformers for Grounded Situation Recognition Mar 30, 2022 Grounded Situation Recognition Image Classification
Code Code Available 1GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 1Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Nov 10, 2023 Diversity Multi-Task Learning
Code Code Available 1Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter Nov 9, 2023 Object Visual Grounding
Code Code Available 1Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations Jun 30, 2022 Language Modeling Language Modelling
Code Code Available 1Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving May 13, 2025 3D visual grounding Autonomous Driving
Code Code Available 1CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 1Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning Apr 30, 2022 Attribute Decoder
Code Code Available 1InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring Mar 1, 2021 3D visual grounding Attribute
Code Code Available 1Fine-Grained Semantically Aligned Vision-Language Pre-Training Aug 4, 2022 cross-modal alignment object-detection
Code Code Available 1CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision Dec 14, 2021 Contrastive Learning Representation Learning
Code Code Available 1