SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 2650 of 280 papers

TitleStatusHype
Movie Summarization via Sparse Graph ConstructionCode1
Multimodal Summarization of User-Generated VideosCode1
Query-controllable Video SummarizationCode1
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of VideosCode1
Align and Attend: Multimodal Summarization with Dual Contrastive LossesCode1
Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and SummarizationCode1
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Contrastive Losses Are Natural Criteria for Unsupervised Video SummarizationCode1
AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video SummarizationCode1
Discriminative Latent Semantic Graph for Video CaptioningCode1
Do Language Models Understand Time?Code1
A Comprehensive Review of the Video-to-Text ProblemCode1
Convolutional Hierarchical Attention Network for Query-Focused Video SummarizationCode1
Self-Attention Recurrent Summarization Network with Reinforcement Learning for Video Summarization TaskCode1
Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesCode1
Ultrasound Video Summarization using Deep Reinforcement LearningCode1
VideoXum: Cross-modal Visual and Textural Summarization of VideosCode1
Multi-Stream Dynamic Video SummarizationCode0
Query-adaptive Video Summarization via Quality-aware Relevance EstimationCode0
Adaptive frame selection in two dimensional convolutional neural network action recognitionCode0
A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video SummarizationCode0
APES: Audiovisual Person Search in Untrimmed VideoCode0
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer VisionCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
Show:102550
← PrevPage 2 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4PGL-SUMF1-score (Canonical)61Unverified
5M-AVSF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified