NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning Oct 17, 2023 Segmentation Visual Grounding
Code Code Available 0Lightweight In-Context Tuning for Multimodal Unified Models Oct 8, 2023 Image Captioning In-Context Learning
— Unverified 0Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection Sep 18, 2023 3D Object Detection 3D Open-Vocabulary Object Detection
— Unverified 0Collecting Visually-Grounded Dialogue with A Game Of Sorts Sep 10, 2023 Coreference Resolution Image Retrieval
Code Code Available 0Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding Sep 8, 2023 3D Instance Segmentation 3D visual grounding
— Unverified 0DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners Sep 7, 2023 Diagnostic Visual Grounding
Code Code Available 0Interpretable Visual Question Answering via Reasoning Supervision Sep 7, 2023 Common Sense Reasoning Question Answering
— Unverified 0FACET: Fairness in Computer Vision Evaluation Benchmark Aug 31, 2023 Fairness image-classification
— Unverified 0WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model Aug 30, 2023 Language Modeling Language Modelling
— Unverified 0HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks Aug 24, 2023 Language Modeling Language Modelling
Code Code Available 0Language-Guided Diffusion Model for Visual Grounding Aug 18, 2023 cross-modal alignment Denoising
Code Code Available 03DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding Jul 25, 2023 3D visual grounding Object
— Unverified 0GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation Jul 12, 2023 Lifelong learning Object Detection
Code Code Available 0OG: Equip vision occupancy with instance segmentation and visual grounding Jul 12, 2023 Instance Segmentation Segmentation
— Unverified 0Learning with Difference Attention for Visually Grounded Self-supervised Representations Jun 26, 2023 Self-Supervised Learning Visual Grounding
— Unverified 0Extending CLIP's Image-Text Alignment to Referring Image Segmentation Jun 14, 2023 Image Segmentation Referring Expression Segmentation
— Unverified 0Referring to Screen Texts with Voice Assistants Jun 10, 2023 Navigate Visual Grounding
— Unverified 0Language Adaptive Weight Generation for Multi-task Visual Grounding Jun 6, 2023 Referring Expression Referring Expression Comprehension
Code Code Available 0Leverage Points in Modality Shifts: Comparing Language-only and Multimodal Word Representations Jun 4, 2023 Visual Grounding Word Embeddings
Code Code Available 0Benchmarking Diverse-Modal Entity Linking with Generative Models May 27, 2023 Benchmarking Decoder
— Unverified 0Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving May 25, 2023 3D Object Detection Autonomous Driving
— Unverified 0Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 0An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics May 24, 2023 Image Captioning Negation
Code Code Available 0TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding May 19, 2023 Sentence Visual Grounding
— Unverified 0Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding May 18, 2023 Contrastive Learning Object
— Unverified 0Sample-Specific Debiasing for Better Image-Text Models Apr 25, 2023 Contrastive Learning Cross-Modal Retrieval
— Unverified 0Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining Apr 20, 2023 Visual Grounding
— Unverified 0WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language Apr 12, 2023 3D visual grounding Autonomous Driving
Code Code Available 0ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding Mar 23, 2023 3D visual grounding Visual Grounding
Code Code Available 0Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment Mar 14, 2023 Medical Image Analysis Phrase Grounding
— Unverified 0Parallel Vertex Diffusion for Unified Visual Grounding Mar 13, 2023 Visual Grounding
— Unverified 0Focusing On Targets For Improving Weakly Supervised Visual Grounding Feb 22, 2023 Dependency Parsing Object
— Unverified 0Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks Jan 12, 2023 Cross-Modal Retrieval Open-Ended Question Answering
Code Code Available 0ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding Jan 1, 2023 3D visual grounding Visual Grounding
— Unverified 0CoSign: Exploring Co-occurrence Signals in Skeleton-based Continuous Sign Language Recognition Jan 1, 2023 Sign Language Recognition Visual Grounding
— Unverified 0Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 0GAFNet: A Global Fourier Self Attention Based Novel Network for multi-modal downstream tasks Jan 1, 2023 Image Generation Image-text Retrieval
— Unverified 0Using Multiple Instance Learning to Build Multimodal Representations Dec 11, 2022 Contrastive Learning Cross-Modal Retrieval
— Unverified 0UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding Dec 1, 2022 3D dense captioning 3D visual grounding
— Unverified 0MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding Nov 27, 2022 named-entity-recognition Named Entity Recognition
— Unverified 0A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0Visually Grounded VQA by Lattice-based Retrieval Nov 15, 2022 Information Retrieval Question Answering
Code Code Available 0Are Current Decoding Strategies Capable of Facing the Challenges of Visual Dialogue? Oct 24, 2022 Informativeness Text Generation
— Unverified 0RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data Oct 23, 2022 Image Captioning Image-text Retrieval
— Unverified 0A Visual Tour Of Current Challenges In Multimodal Language Models Oct 22, 2022 Image Generation Text to Image Generation
— Unverified 0Like a bilingual baby: The advantage of visually grounding a bilingual language model Oct 11, 2022 Language Modeling Language Modelling
— Unverified 0YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding Oct 10, 2022 Visual Grounding
— Unverified 0MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning Oct 9, 2022 Image-text Retrieval multimodal interaction
— Unverified 0Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach Oct 3, 2022 Referring Expression Robot Manipulation
Code Code Available 0Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement Oct 1, 2022 Graph Neural Network Object
— Unverified 0