SOTAVerified

3D dense captioning

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest.

Papers

Showing 2126 of 26 papers

TitleStatusHype
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
0/1 Deep Neural Networks via Block Coordinate Descent0
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content CreationCode0
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based LocalizationCode0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.