SOTAVerified

3D visual grounding

Papers

Showing 2650 of 82 papers

TitleStatusHype
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual GroundingCode1
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-AnalysisCode1
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype GuidanceCode1
Visual Programming for Zero-shot Open-Vocabulary 3D Visual GroundingCode1
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual GroundingCode1
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual GroundingCode1
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual GroundingCode0
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference UnderstandingCode0
Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference UnderstandingCode0
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph AttentionCode0
Multi-Attribute Interactions Matter for 3D Visual GroundingCode0
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and ConsistencyCode0
Beyond Human Perception: Understanding Multi-Object World from Monocular ViewCode0
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingCode0
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based LocalizationCode0
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural LanguageCode0
Zero-Shot 3D Visual Grounding from Vision-Language Models0
3D Scene Graph Guided Vision-Language Pre-training0
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding0
AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring0
Bayesian Self-Training for Semi-Supervised 3D Segmentation0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding0
Data-Efficient 3D Visual Grounding via Order-Aware Referring0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.