| WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language | Apr 12, 2023 | 3D visual groundingAutonomous Driving | CodeCode Available | 0 | 5 |
| Zero-Shot 3D Visual Grounding from Vision-Language Models | May 28, 2025 | 3D visual groundingVisual Grounding | —Unverified | 0 | 0 |
| 3D Scene Graph Guided Vision-Language Pre-training | Nov 27, 2024 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation | Dec 9, 2024 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Jul 9, 2025 | 3D visual groundingAutonomous Navigation | —Unverified | 0 | 0 |
| AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring | Jan 16, 2025 | 3D visual groundingDecoder | —Unverified | 0 | 0 |
| Bayesian Self-Training for Semi-Supervised 3D Segmentation | Sep 12, 2024 | 3D Instance Segmentation3D Semantic Segmentation | —Unverified | 0 | 0 |
| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding | May 8, 2025 | 3D visual groundingcross-modal alignment | —Unverified | 0 | 0 |
| Data-Efficient 3D Visual Grounding via Order-Aware Referring | Mar 25, 2024 | 3D visual groundingObject | —Unverified | 0 | 0 |