Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding Feb 23, 2024 Hallucination Object
Code Code Available 1Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions Feb 17, 2024 Visual Grounding
Code Code Available 1LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition Feb 15, 2024 Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition
Code Code Available 1Unifying Visual and Vision-Language Tracking via Contrastive Learning Jan 20, 2024 Contrastive Learning Object Tracking
Code Code Available 1Veagle: Advancements in Multimodal Representation Learning Jan 18, 2024 Image Captioning Language Modelling
Code Code Available 1GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection Dec 22, 2023 Attribute object-detection
Code Code Available 1Mask Grounding for Referring Image Segmentation Dec 19, 2023 cross-modal alignment Image Segmentation
Code Code Available 1Context Disentangling and Prototype Inheriting for Robust Visual Grounding Dec 19, 2023 Visual Grounding
Code Code Available 1Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation Dec 13, 2023 Descriptive Object
Code Code Available 1Mono3DVG: 3D Visual Grounding in Monocular Images Dec 13, 2023 3D Object Detection 3D visual grounding
Code Code Available 1GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models Dec 6, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 1Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions Nov 28, 2023 Disentanglement Referring Expression
Code Code Available 1Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding Nov 26, 2023 3D visual grounding Object
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Nov 10, 2023 Diversity Multi-Task Learning
Code Code Available 1Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter Nov 9, 2023 Object Visual Grounding
Code Code Available 1GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data Oct 28, 2023 3D visual grounding Autonomous Vehicles
Code Code Available 1OV-VG: A Benchmark for Open-Vocabulary Visual Grounding Oct 22, 2023 Novel Concepts object-detection
Code Code Available 1Visual Grounding Helps Learn Word Meanings in Low-Data Regimes Oct 20, 2023 Image Captioning Language Acquisition
Code Code Available 1CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding Oct 10, 2023 3D visual grounding Visual Grounding
Code Code Available 1Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Oct 9, 2023 Language Modelling Question Answering
Code Code Available 1PROGrasp: Pragmatic Human-Robot Communication for Object Grasping Sep 14, 2023 Object Object Discovery
Code Code Available 1Multi3DRefer: Grounding Text Description to Multiple 3D Objects Sep 11, 2023 3D visual grounding Contrastive Learning
Code Code Available 1VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders Sep 3, 2023 Visual Grounding
Code Code Available 1UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory Aug 28, 2023 Question Answering Retrieval
Code Code Available 1A Unified Framework for 3D Point Cloud Visual Grounding Aug 23, 2023 CPU GPU
Code Code Available 1Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation Aug 22, 2023 Visual Grounding
Code Code Available 1Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision Jul 23, 2023 Decoder Visual Grounding
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Jul 18, 2023 3D visual grounding Object
Code Code Available 1What Do Self-Supervised Speech Models Know About Words? Jun 30, 2023 Sentence Sentence Similarity
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans May 23, 2023 3D Reconstruction 3D visual grounding
Code Code Available 1Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model May 19, 2023 Language Modeling Language Modelling
Code Code Available 1CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding May 15, 2023 Diversity Transfer Learning
Code Code Available 1ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance Mar 29, 2023 3D visual grounding Visual Grounding
Code Code Available 1Joint Visual Grounding and Tracking with Natural Language Specification Mar 21, 2023 Visual Grounding Visual Tracking
Code Code Available 1Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training Jan 1, 2023 3D dense captioning 3D visual grounding
Code Code Available 1Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding Jan 1, 2023 Descriptive Object
Code Code Available 1Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding Nov 28, 2022 object-detection Object Detection
Code Code Available 1Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding Nov 25, 2022 3D visual grounding Knowledge Distillation
Code Code Available 1YORO -- Lightweight End to End Visual Grounding Nov 15, 2022 Natural Language Queries Visual Grounding
Code Code Available 1Instruction-Following Agents with Multimodal Transformer Oct 24, 2022 Instruction Following Visual Grounding
Code Code Available 1Learning Point-Language Hierarchical Alignment for 3D Visual Grounding Oct 22, 2022 3D visual grounding Sentence
Code Code Available 1GRAVL-BERT: Graphical Visual-Linguistic Representations for Multimodal Coreference Resolution Oct 1, 2022 coreference-resolution Coreference Resolution
Code Code Available 1EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Sep 29, 2022 3D visual grounding Object
Code Code Available 1Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment Aug 29, 2022 cross-modal alignment Image-text Retrieval
Code Code Available 1