Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network Oct 25, 2023 Visual Grounding
Code Code Available 0OV-VG: A Benchmark for Open-Vocabulary Visual Grounding Oct 22, 2023 Novel Concepts object-detection
Code Code Available 1Visual Grounding Helps Learn Word Meanings in Low-Data Regimes Oct 20, 2023 Image Captioning Language Acquisition
Code Code Available 1InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions Oct 18, 2023 Benchmarking Visual Grounding
Code Code Available 0NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning Oct 17, 2023 Segmentation Visual Grounding
Code Code Available 0Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Oct 17, 2023 Interactive Segmentation Referring Expression
Code Code Available 4MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Oct 14, 2023 Image Classification Image Description
Code Code Available 7From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 2CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding Oct 10, 2023 3D visual grounding Visual Grounding
Code Code Available 1Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Oct 9, 2023 Language Modelling Question Answering
Code Code Available 1Lightweight In-Context Tuning for Multimodal Unified Models Oct 8, 2023 Image Captioning In-Context Learning
— Unverified 0LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Sep 21, 2023 3D visual grounding Language Modeling
Code Code Available 2Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection Sep 18, 2023 3D Object Detection 3D Open-Vocabulary Object Detection
— Unverified 0PROGrasp: Pragmatic Human-Robot Communication for Object Grasping Sep 14, 2023 Object Object Discovery
Code Code Available 1Multi3DRefer: Grounding Text Description to Multiple 3D Objects Sep 11, 2023 3D visual grounding Contrastive Learning
Code Code Available 1Collecting Visually-Grounded Dialogue with A Game Of Sorts Sep 10, 2023 Coreference Resolution Image Retrieval
Code Code Available 0Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding Sep 8, 2023 3D Instance Segmentation 3D visual grounding
— Unverified 0Interpretable Visual Question Answering via Reasoning Supervision Sep 7, 2023 Common Sense Reasoning Question Answering
— Unverified 0DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners Sep 7, 2023 Diagnostic Visual Grounding
Code Code Available 0VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders Sep 3, 2023 Visual Grounding
Code Code Available 1FACET: Fairness in Computer Vision Evaluation Benchmark Aug 31, 2023 Fairness image-classification
— Unverified 0WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model Aug 30, 2023 Language Modeling Language Modelling
— Unverified 0UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory Aug 28, 2023 Question Answering Retrieval
Code Code Available 1HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks Aug 24, 2023 Language Modeling Language Modelling
Code Code Available 0Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5A Unified Framework for 3D Point Cloud Visual Grounding Aug 23, 2023 CPU GPU
Code Code Available 1Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation Aug 22, 2023 Visual Grounding
Code Code Available 1Language-Guided Diffusion Model for Visual Grounding Aug 18, 2023 cross-modal alignment Denoising
Code Code Available 03D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment Aug 8, 2023 3D Question Answering (3D-QA) Dense Captioning
Code Code Available 23DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Jul 25, 2023 3D visual grounding Object
— Unverified 0Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision Jul 23, 2023 Decoder Visual Grounding
Code Code Available 1Advancing Visual Grounding with Scene Knowledge: Benchmark and Method Jul 21, 2023 Image-text matching Text Matching
Code Code Available 1Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding Jul 18, 2023 3D visual grounding Object
Code Code Available 1BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs Jul 17, 2023 Instruction Following Sentence
Code Code Available 2GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation Jul 12, 2023 Lifelong learning Object Detection
Code Code Available 0OG: Equip vision occupancy with instance segmentation and visual grounding Jul 12, 2023 Instance Segmentation Segmentation
— Unverified 0What Do Self-Supervised Speech Models Know About Words? Jun 30, 2023 Sentence Sentence Similarity
Code Code Available 1Learning with Difference Attention for Visually Grounded Self-supervised Representations Jun 26, 2023 Self-Supervised Learning Visual Grounding
— Unverified 0Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Extending CLIP's Image-Text Alignment to Referring Image Segmentation Jun 14, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 0Referring to Screen Texts with Voice Assistants Jun 10, 2023 Navigate Visual Grounding
— Unverified 0Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Language Adaptive Weight Generation for Multi-task Visual Grounding Jun 6, 2023 Referring Expression Referring Expression Comprehension
Code Code Available 0Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations Jun 4, 2023 Visual Grounding Word Embeddings
Code Code Available 0Benchmarking Diverse-Modal Entity Linking with Generative Models May 27, 2023 Benchmarking Decoder
— Unverified 0Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving May 25, 2023 3D Object Detection Autonomous Driving
— Unverified 0Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 0An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics May 24, 2023 Image Captioning Negation
Code Code Available 0Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans May 23, 2023 3D Reconstruction 3D visual grounding
Code Code Available 1Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model May 19, 2023 Language Modeling Language Modelling
Code Code Available 1