SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 76100 of 280 papers

TitleStatusHype
UBiSS: A Unified Framework for Bimodal Semantic Summarization of VideosCode0
A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization MethodsCode0
CSTA: CNN-based Spatiotemporal Attention for Video Summarization0
"Previously on ..." From Recaps to Story Summarization0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video0
Pegasus-v1 Technical Report0
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning0
Cluster-based Video Summarization with Temporal Context AwarenessCode0
Enhancing Video Summarization with Context AwarenessCode0
Scaling Up Video Summarization Pretraining with Large Language Models0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding0
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts0
Large Model based Sequential Keyframe Extraction for Video Summarization0
Previously on ... From Recaps to Story Summarization0
Beyond the Frame: Single and mutilple video summarization method with user-defined length0
An Integrated System for Spatio-Temporal Summarization of 360-degrees VideosCode0
Facilitating the Production of Well-tailored Video Summaries for Sharing on Social Media0
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from VideoCode0
Video Summarization: Towards Entity-Aware CaptionsCode0
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames0
Conditional Modeling Based Automatic Video Summarization0
Unsupervised Video Summarization via Iterative Training and Simplified GANCode0
Dynamic Non-monotone Submodular Maximization0
Show:102550
← PrevPage 4 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified