| MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes | Mar 10, 2022 | 3D dense captioningDense Captioning | CodeCode Available | 1 |
| X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning | Mar 2, 2022 | 3D dense captioningDense Captioning | CodeCode Available | 1 |
| Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization | Nov 1, 2021 | Dense CaptioningImage Generation | CodeCode Available | 1 |
| Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization | Oct 21, 2021 | Dense CaptioningImage Generation | CodeCode Available | 1 |
| Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020 | Jun 21, 2020 | Dense CaptioningDense Video Captioning | CodeCode Available | 1 |
| Dense-Captioning Events in Videos | May 2, 2017 | Dense CaptioningRetrieval | CodeCode Available | 1 |
| Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Jun 5, 2025 | cross-modal alignmentDense Captioning | —Unverified | 0 |
| 3D Spatial Understanding in MLLMs: Disambiguation and Evaluation | Dec 9, 2024 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| 3D Scene Graph Guided Vision-Language Pre-training | Nov 27, 2024 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving | Sep 10, 2024 | 3D dense captioningAutonomous Driving | —Unverified | 0 |