SOTAVerified

3D dense captioning

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest.

Papers

Showing 2126 of 26 papers

TitleStatusHype
Contextual Modeling for 3D Dense Captioning on Point Clouds0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
0/1 Deep Neural Networks via Block Coordinate Descent0
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.