SOTAVerified

Dense Captioning

Papers

Showing 110 of 69 papers

TitleStatusHype
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous DrivingCode1
Does Your 3D Encoder Really Work? When Pretrain-SFT from 2D VLMs Meets 3D VLMs0
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in ActionCode1
3D Spatial Understanding in MLLMs: Disambiguation and Evaluation0
PerLA: Perceptive 3D Language AssistantCode1
3D Scene Graph Guided Vision-Language Pre-training0
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving0
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations0
See It All: Contextualized Late Aggregation for 3D Dense Captioning0
Show:102550
← PrevPage 1 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1ControlCapmAP18.2Unverified
2GRiT (ViT-B)mAP15.5Unverified
3CAG-NetmAP10.5Unverified
4FCLNmAP5.4Unverified