| RUC+CMU: System Report for Dense Captioning Events in Videos | Jun 22, 2018 | Caption GenerationDense Captioning | —Unverified | 0 | 0 |
| SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions | Jul 24, 2022 | Dense CaptioningDense Video Captioning | —Unverified | 0 | 0 |
| Scan2Cap: Context-aware Dense Captioning in RGB-D Scans | Dec 3, 2020 | 3D dense captioning3D Object Detection | —Unverified | 0 | 0 |
| Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning | Mar 18, 2024 | 3D Question Answering (3D-QA)Dense Captioning | —Unverified | 0 | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 | 0 |
| Semantic-Aware Pretraining for Dense Video Captioning | Apr 13, 2022 | Dense CaptioningDense Video Captioning | —Unverified | 0 | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 | 0 |
| Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos | Jun 25, 2018 | Dense CaptioningOptical Flow Estimation | —Unverified | 0 | 0 |
| Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning | Jun 14, 2020 | Dense CaptioningDense Video Captioning | —Unverified | 0 | 0 |
| 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds | Jan 1, 2022 | 3D dense captioningAttribute | —Unverified | 0 | 0 |
| Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos | Jul 11, 2019 | Dense CaptioningDense Video Captioning | —Unverified | 0 | 0 |
| A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes | Mar 12, 2024 | 3D dense captioningDense Captioning | —Unverified | 0 | 0 |
| Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019 | Jun 14, 2019 | Action LocalizationAction Recognition | —Unverified | 0 | 0 |
| UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding | Dec 1, 2022 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation | Jul 4, 2017 | Dense CaptioningMachine Translation | —Unverified | 0 | 0 |
| Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs | Jun 5, 2025 | cross-modal alignmentDense Captioning | —Unverified | 0 | 0 |
| Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition | Mar 19, 2024 | Dense CaptioningImage Captioning | —Unverified | 0 | 0 |
| FlexCap: Describe Anything in Images in Controllable Detail | Mar 18, 2024 | AttributeDense Captioning | —Unverified | 0 | 0 |
| Fooling Vision and Language Models Despite Localization and Attention Mechanism | Sep 25, 2017 | Dense CaptioningNatural Language Understanding | —Unverified | 0 | 0 |