SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 125 of 280 papers

TitleStatusHype
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video UnderstandingCode4
Egocentric Video-Language PretrainingCode2
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of VideoCode2
An Egocentric Vision-Language Model based Portable Real-time Smart AssistantCode2
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal GroundingCode2
VideoSAGE: Video Summarization with Graph Representation LearningCode2
Multi-modal Summarization for Video-containing DocumentsCode1
LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNNCode1
Multimodal Summarization of User-Generated VideosCode1
Learning Discriminative Prototypes with Dynamic Time WarpingCode1
Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score.Code1
MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video SummarizationCode1
Movie Summarization via Sparse Graph ConstructionCode1
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneCode1
Do Language Models Understand Time?Code1
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
Contrastive Losses Are Natural Criteria for Unsupervised Video SummarizationCode1
Combining Global and Local Attention with Positional Encoding for Video SummarizationCode1
Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative ScoreCode1
DSNet: A Flexible Detect-to-Summarize Network for Video SummarizationCode1
Convolutional Hierarchical Attention Network for Query-Focused Video SummarizationCode1
Align and Attend: Multimodal Summarization with Dual Contrastive LossesCode1
Joint Moment Retrieval and Highlight Detection Via Natural Language QueriesCode1
AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video SummarizationCode1
Show:102550
← PrevPage 1 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4PGL-SUMF1-score (Canonical)61Unverified
5M-AVSF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified