SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 51100 of 280 papers

TitleStatusHype
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts0
Large Model based Sequential Keyframe Extraction for Video Summarization0
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of VideoCode2
Previously on ... From Recaps to Story Summarization0
Beyond the Frame: Single and mutilple video summarization method with user-defined length0
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
An Integrated System for Spatio-Temporal Summarization of 360-degrees VideosCode0
Facilitating the Production of Well-tailored Video Summaries for Sharing on Social Media0
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from VideoCode0
Video Summarization: Towards Entity-Aware CaptionsCode0
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames0
Conditional Modeling Based Automatic Video Summarization0
Unsupervised Video Summarization via Iterative Training and Simplified GANCode0
Dynamic Non-monotone Submodular Maximization0
DeVAn: Dense Video Annotation for Video-Language ModelsCode0
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling0
Mr. HiSum: A Large-scale Dataset for Video Highlight Detection and Summarization0
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization0
Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score.Code1
Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative ScoreCode1
Saliency-based Video Summarization for Face Anti-spoofing0
UniVTG: Towards Unified Video-Language Temporal GroundingCode2
Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization0
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the BackboneCode1
Causal Video Summarizer for Video Exploration0
Query-based Video Summarization with Pseudo Label Supervision0
Key Frame Extraction with Attention Based Deep Neural Networks0
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of VideosCode1
Masked Autoencoder for Unsupervised Video Summarization0
Motion-Based Sign Language Video Summarization using Curvature and Torsion0
Joint Moment Retrieval and Highlight Detection Via Natural Language QueriesCode1
Causalainer: Causal Explainer for Automatic Video Summarization0
Hierarchical Video-Moment Retrieval and Step-CaptioningCode1
SELF-VS: Self-supervised Encoding Learning For Video SummarizationCode0
VideoXum: Cross-modal Visual and Textural Summarization of VideosCode1
Align and Attend: Multimodal Summarization with Dual Contrastive LossesCode1
VideoSum: A Python Library for Surgical Video SummarizationCode1
Learning to Summarize Videos by Contrasting Clips0
Adaptive frame selection in two dimensional convolutional neural network action recognitionCode0
Role of Audio in Audio-Visual Video Summarization0
Contrastive Losses Are Natural Criteria for Unsupervised Video SummarizationCode1
Video Summarization Overview0
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency0
Multimodal Frame-Scoring Transformer for Video Summarization0
Multimodal Intent Discovery from Livestream Videos0
Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video FramesCode1
Egocentric Video-Language PretrainingCode2
A Multi-stage deep architecture for summary generation of soccer videos0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4PGL-SUMF1-score (Canonical)61Unverified
5M-AVSF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified