SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 101125 of 280 papers

TitleStatusHype
Improving Sequential Determinantal Point Processes for Supervised Video Summarization0
Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization0
A Survey on Patch-based Synthesis: GPU Implementation and Optimization0
A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos0
Image Conditioned Keyframe-Based Video Summarization Using Object Detection0
Human Pose Estimation using Motion Priors and Ensemble Models0
HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization0
Creating Summaries from User Videos0
How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization0
How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization0
Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization0
Co-Regularized Deep Representations for Video Summarization0
A Paradigm for Building Generalized Models of Human Image Perception Through Data Fusion0
Hierarchical Recurrent Neural Network for Video Summarization0
Hierarchical Multimodal Transformer to Summarize Videos0
Group Activity Recognition by Using Effective Multiple Modality Relation Representation With Temporal-Spatial Attention0
Conditional Modeling Based Automatic Video Summarization0
Key Frame Extraction with Attention Based Deep Neural Networks0
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video0
Large-Margin Determinantal Point Processes0
Large Model based Sequential Keyframe Extraction for Video Summarization0
Large-Scale Video Summarization Using Web-Image Priors0
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts0
Global-and-Local Relative Position Embedding for Unsupervised Video Summarization0
Generating Natural Language Summaries for Multimedia0
Show:102550
← PrevPage 5 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified