SOTAVerified

Dense Video Captioning

Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video.

Papers

Showing 110 of 76 papers

TitleStatusHype
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Dense Video Captioning using Graph-based Sentence Summarization0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding0
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video CaptioningCode1
Event-Equalized Dense Video Captioning0
HiCM^2: Hierarchical Compact Memory Modeling for Dense Video CaptioningCode1
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video CaptioningCode0
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning0
Video LLMs for Temporal Reasoning in Long Videos0
Show:102550
← PrevPage 1 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr55.7Unverified