SOTAVerified

3D dense captioning

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest.

Papers

Showing 110 of 26 papers

TitleStatusHype
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and PlanningCode3
TOD3Cap: Towards 3D Dense Captioning in Outdoor ScenesCode2
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningCode2
An Embodied Generalist Agent in 3D WorldCode2
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense CaptioningCode1
End-to-End 3D Dense Captioning with Vote2Cap-DETRCode1
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.