SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 101150 of 280 papers

TitleStatusHype
DeVAn: Dense Video Annotation for Video-Language ModelsCode0
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling0
Mr. HiSum: A Large-scale Dataset for Video Highlight Detection and Summarization0
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization0
Saliency-based Video Summarization for Face Anti-spoofing0
Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization0
Query-based Video Summarization with Pseudo Label Supervision0
Causal Video Summarizer for Video Exploration0
Key Frame Extraction with Attention Based Deep Neural Networks0
Masked Autoencoder for Unsupervised Video Summarization0
Motion-Based Sign Language Video Summarization using Curvature and Torsion0
Causalainer: Causal Explainer for Automatic Video Summarization0
SELF-VS: Self-supervised Encoding Learning For Video SummarizationCode0
Learning to Summarize Videos by Contrasting Clips0
Adaptive frame selection in two dimensional convolutional neural network action recognitionCode0
Role of Audio in Audio-Visual Video Summarization0
Video Summarization Overview0
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency0
Multimodal Frame-Scoring Transformer for Video Summarization0
Multimodal Intent Discovery from Livestream Videos0
A Multi-stage deep architecture for summary generation of soccer videos0
Towards Practical and Efficient Long Video SummaryCode0
Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization0
NEWSKVQA: Knowledge-Aware News Video Question Answering0
Exploring Global Diversity and Local Context for Video Summarization0
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer0
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment0
A Stacking Ensemble Approach for Supervised Video Summarization0
Hierarchical Multimodal Transformer to Summarize Videos0
ERA: Entity Relationship Aware Video Summarization with Wasserstein GANCode0
Unsupervised multi-latent space reinforcement learning framework for video summarization in ultrasound imagingCode0
Use of Affective Visual Information for Summarization of Human-Centric Videos0
CLIP-It! Language-Guided Video SummarizationCode0
Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net0
APES: Audiovisual Person Search in Untrimmed VideoCode0
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network0
AudioVisual Video Summarization0
DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization0
Reconstructive Sequence-Graph Network for Video Summarization0
GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video SummarizationCode0
The Power of Subsampling in Submodular Maximization0
Distance Metric-Based Learning with Interpolated Latent Features for Location Classification in Endoscopy Image and Video0
Visual Question Answering: which investigated applications?Code0
Audiovisual Highlight Detection in Videos0
Efficient Video Summarization Framework using EEG and Eye-tracking Signals0
How Good is a Video Summary? A New Benchmarking Dataset and Evaluation Framework Towards Realistic Video Summarization0
Video Summarization: Study of various techniques0
Video Summarization Using Deep Neural Networks: A Survey0
Multiple Pairwise Ranking Networks for Personalized Video Summarization0
A New Action Recognition Framework for Video Highlights Summarization in Sporting Events0
Show:102550
← PrevPage 3 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified