SOTAVerified

3D visual grounding

Papers

Showing 5182 of 82 papers

TitleStatusHype
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding0
3D Scene Graph Guided Vision-Language Pre-training0
LidaRefer: Outdoor 3D Visual Grounding for Autonomous Driving with Transformers0
Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding0
Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding0
Bayesian Self-Training for Semi-Supervised 3D Segmentation0
Task-oriented Sequential Grounding in 3D Scenes0
PD-APE: A Parallel Decoding Framework with Adaptive Position Encoding for 3D Visual Grounding0
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities0
Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding0
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention0
Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding0
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners0
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based LocalizationCode0
Data-Efficient 3D Visual Grounding via Order-Aware Referring0
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph AttentionCode0
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding0
Viewpoint-Aware Visual Grounding in 3D Scenes0
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and ConsistencyCode0
G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding0
Multi-Attribute Interactions Matter for 3D Visual GroundingCode0
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment0
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding0
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding0
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural LanguageCode0
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingCode0
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding0
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding0
LanguageRefer: Spatial-Language Model for 3D Visual Grounding0
Show:102550
← PrevPage 2 of 2Next →

No leaderboard results yet.