SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 151200 of 280 papers

TitleStatusHype
Text Synopsis Generation for Egocentric Videos0
The Power of Subsampling in Submodular Maximization0
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency0
Transforming Multi-Concept Attention into Video Summarization0
TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness0
TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations0
TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation0
TVSum: Summarizing Web Videos Using Titles0
Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance0
Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder0
Unsupervised Transcript-assisted Video Summarization and Highlight Detection0
Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator0
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network0
Use of Affective Visual Information for Summarization of Human-Centric Videos0
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning0
Video Co-Summarization: Video Summarization by Visual Co-Occurrence0
Video Object Segmentation and Tracking: A Survey0
Video Skimming: Taxonomy and Comprehensive Survey0
Video Summarization by Learning Submodular Mixtures of Objectives0
Video Summarization in a Multi-View Camera Network0
Video Summarization Overview0
Video Summarization: Study of various techniques0
Video Summarization Techniques: A Comprehensive Review0
A Mobile Robot Generating Video Summaries of Seniors' Indoor Activities0
Video Summarization through Reinforcement Learning with a 3D Spatio-Temporal U-Net0
Video Summarization Using Deep Neural Networks: A Survey0
Video Summarization using Denoising Diffusion Probabilistic Model0
Video Summarization Using Fully Convolutional Sequence Networks0
Video Summarization via Actionness Ranking0
Video Summarization with Attention-Based Encoder-Decoder Networks0
Video Summarization with Large Language Models0
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling0
Viewpoint-aware Video Summarization0
Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel0
Visual Summarization of Scholarly Videos using Word Embeddings and Keyphrase Extraction0
VSCAN: An Enhanced Video Summarization using Density-based Spatial Clustering0
Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning0
Multi-view Metric Learning for Multi-view Video Summarization0
Multi-View Surveillance Video Summarization via Joint Embedding and Sparse Optimization0
NEWSKVQA: Knowledge-Aware News Video Question Answering0
NLP Driven Ensemble Based Automatic Subtitle Generation and Semantic Video Summarization Technique0
Non-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings0
Online Learnable Keyframe Extraction in Videos and its Application with Semantic Word Vector in Action Recognition0
Online Summarization via Submodular and Convex Optimization0
Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing0
Parameter-free Video Segmentation for Vision and Language Understanding0
Pegasus-v1 Technical Report0
Personalized Video Summarization by Multimodal Video Understanding0
Personalized Video Summarization using Text-Based Queries and Conditional Modeling0
Predicting Important Objects for Egocentric Video Summarization0
Show:102550
← PrevPage 4 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified