SOTAVerified

3D dense captioning

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest.

Papers

Showing 125 of 26 papers

TitleStatusHype
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
An Embodied Generalist Agent in 3D WorldCode2
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
3D CoCa: Contrastive Learners are 3D CaptionersCode0
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
3D Scene Graph Guided Vision-Language Pre-training0
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content CreationCode0
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Bi-directional Contextual Attention for 3D Dense Captioning0
Complete 3d relationships extraction modality alignment network for 3d dense captioning0
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based LocalizationCode0
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes0
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding0
Contextual Modeling for 3D Dense Captioning on Point Clouds0
0/1 Deep Neural Networks via Block Coordinate Descent0
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.