SOTAVerified

3D dense captioning

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest.

Papers

Showing 2126 of 26 papers

TitleStatusHype
Spatiality-guided Transformer for 3D Dense Captioning on Point CloudsCode1
MORE: Multi-Order RElation Mining for Dense Captioning in 3D ScenesCode1
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningCode1
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds0
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding0
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans0
Show:102550
← PrevPage 3 of 3Next →

No leaderboard results yet.