Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention May 28, 2024 3D Object Detection 3D visual grounding
— Unverified 0LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding May 27, 2024 Visual Grounding
— Unverified 0Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding May 24, 2024 3D visual grounding Autonomous Driving
— Unverified 0Adversarial Robustness for Visual Grounding of Multimodal Large Language Models May 16, 2024 Adversarial Attack Adversarial Robustness
Code Code Available 0Visual grounding for desktop graphical user interfaces May 5, 2024 Language Modeling Language Modelling
— Unverified 0Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners Apr 30, 2024 3D visual grounding Visual Grounding
— Unverified 0BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Apr 26, 2024 Game Design Image Generation
— Unverified 0Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization Apr 17, 2024 3D dense captioning 3D visual grounding
Code Code Available 0MedRG: Medical Report Grounding with Multi-modal Large Language Model Apr 10, 2024 Decoder Language Modeling
— Unverified 0Data-Efficient 3D Visual Grounding via Order-Aware Referring Mar 25, 2024 3D visual grounding Object
— Unverified 0Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery Mar 22, 2024 Language Modeling Language Modelling
— Unverified 0VidLA: Video-Language Alignment at Scale Mar 21, 2024 Language Modelling Visual Grounding
— Unverified 0Learning from Synthetic Data for Visual Grounding Mar 20, 2024 Language Modelling Large Language Model
— Unverified 0WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar Mar 19, 2024 Autonomous Navigation Referring Expression
— Unverified 0Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation Mar 14, 2024 Decision Making Language Modeling
— Unverified 0SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention Mar 13, 2024 3D visual grounding cross-modal alignment
Code Code Available 0Detecting Concrete Visual Tokens for Multimodal Machine Translation Mar 5, 2024 Machine Translation Multimodal Machine Translation
— Unverified 0Adversarial Testing for Visual Grounding via Image-Aware Property Reduction Mar 2, 2024 Visual Grounding
— Unverified 0OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web Feb 27, 2024 Language Modeling Language Modelling
— Unverified 0ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling Feb 9, 2024 Hallucination Natural Language Understanding
Code Code Available 0Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations Feb 2, 2024 Contrastive Learning Object
— Unverified 0SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling Feb 1, 2024 Diversity Image Captioning
— Unverified 0LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering Jan 29, 2024 Language Modeling Language Modelling
— Unverified 0SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Jan 17, 2024 3D visual grounding Scene Understanding
— Unverified 0Uncovering the Full Potential of Visual Grounding Methods in VQA Jan 15, 2024 Question Answering Visual Grounding
Code Code Available 0Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers Jan 3, 2024 Question Answering Visual Grounding
— Unverified 0LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation Jan 1, 2024 Image Segmentation Semantic Segmentation
— Unverified 0When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0Viewpoint-Aware Visual Grounding in 3D Scenes Jan 1, 2024 3D visual grounding Referring Expression
— Unverified 0Investigating Compositional Challenges in Vision-Language Models for Visual Grounding Jan 1, 2024 Attribute Relation
Code Code Available 0Multi-Attribute Interactions Matter for 3D Visual Grounding Jan 1, 2024 3D visual grounding Attribute
Code Code Available 0Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency Jan 1, 2024 3D visual grounding Relation
Code Code Available 0Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding Jan 1, 2024 Scene Understanding Visual Grounding
— Unverified 0G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding Jan 1, 2024 3D visual grounding Visual Grounding
— Unverified 0Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation Dec 29, 2023 Visual Grounding
— Unverified 0Cycle-Consistency Learning for Captioning and Grounding Dec 23, 2023 Image Captioning Visual Grounding
— Unverified 0Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment Dec 15, 2023 3D visual grounding Natural Language Queries
— Unverified 0Visual Grounding of Whole Radiology Reports for 3D CT Images Dec 8, 2023 Segmentation Visual Grounding
— Unverified 0Improved Visual Grounding through Self-Consistent Explanations Dec 7, 2023 Language Modelling Large Language Model
— Unverified 0Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment Dec 5, 2023 Explanation Generation Visual Grounding
Code Code Available 0Uni3DL: Unified Model for 3D and Language Understanding Dec 5, 2023 Cross-Modal Retrieval Instance Segmentation
— Unverified 0Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment Dec 4, 2023 Grounded language learning Language Modeling
— Unverified 0G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training Dec 3, 2023 object-detection Object Detection
Code Code Available 0Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models Dec 3, 2023 Hallucination Visual Grounding
Code Code Available 0Context-Aware Indoor Point Cloud Object Generation through User Instructions Nov 26, 2023 Position Visual Grounding
— Unverified 0Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models Nov 21, 2023 Image Segmentation Language Modelling
Code Code Available 0A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis Oct 31, 2023 Descriptive Medical Image Analysis
— Unverified 0GROOViST: A Metric for Grounding Objects in Visual Storytelling Oct 26, 2023 Visual Grounding Visual Storytelling
Code Code Available 0Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network Oct 25, 2023 Visual Grounding
Code Code Available 0InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions Oct 18, 2023 Benchmarking Visual Grounding
Code Code Available 0