| Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training | Jan 1, 2023 | 3D dense captioning3D visual grounding | CodeCode Available | 1 |
| Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020 | Jun 21, 2020 | Dense CaptioningDense Video Captioning | CodeCode Available | 1 |
| Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner | May 19, 2023 | Dense CaptioningImage Captioning | CodeCode Available | 1 |
| Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization | Oct 21, 2021 | Dense CaptioningImage Generation | CodeCode Available | 1 |
| PerLA: Perceptive 3D Language Assistant | Nov 29, 2024 | Dense CaptioningGraph Neural Network | CodeCode Available | 1 |
| TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action | May 2, 2025 | Dense CaptioningHighlight Detection | CodeCode Available | 1 |
| Complete 3d relationships extraction modality alignment network for 3d dense captioning | Aug 1, 2024 | 3D dense captioning3D Object Detection | —Unverified | 0 |
| Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Jun 5, 2025 | cross-modal alignmentDense Captioning | —Unverified | 0 |
| DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | Apr 14, 2024 | Dense CaptioningLanguage Modelling | —Unverified | 0 |
| A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes | Mar 12, 2024 | 3D dense captioningDense Captioning | —Unverified | 0 |