SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 51100 of 280 papers

TitleStatusHype
SD-VSum: A Method and Dataset for Script-Driven Video SummarizationCode0
Video Summarization with Large Language Models0
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention0
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models0
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts0
Parameter-free Video Segmentation for Vision and Language Understanding0
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion0
Integrate the temporal scheme for unsupervised video summarization via attention mechanismCode0
Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications0
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific PresentationsCode0
FullTransNet: Full Transformer with Local-Global Attention for Video Summarization0
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning0
Agent-based Video Trimming0
Video Summarization using Denoising Diffusion Probabilistic Model0
Personalized Video Summarization by Multimodal Video Understanding0
Your Interest, Your Summaries: Query-Focused Long Video SummarizationCode0
Exploring Efficient Foundational Multi-modal Models for Video Summarization0
Realizing Video Summarization from the Path of Language-based Semantic Understanding0
Video Summarization Techniques: A Comprehensive Review0
Does SpatioTemporal information benefit Two video summarization benchmarks?Code0
EDSNet: Efficient-DSNet for Video Summarization0
Personalized Video Summarization using Text-Based Queries and Conditional Modeling0
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos0
Multimodal Language Models for Domain-Specific Procedural Video Summarization0
Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator0
UBiSS: A Unified Framework for Bimodal Semantic Summarization of VideosCode0
A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization MethodsCode0
CSTA: CNN-based Spatiotemporal Attention for Video SummarizationCode0
"Previously on ..." From Recaps to Story Summarization0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video0
Pegasus-v1 Technical Report0
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning0
Cluster-based Video Summarization with Temporal Context AwarenessCode0
Enhancing Video Summarization with Context AwarenessCode0
Scaling Up Video Summarization Pretraining with Large Language Models0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal GroundingCode0
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts0
Large Model based Sequential Keyframe Extraction for Video Summarization0
Previously on ... From Recaps to Story Summarization0
Beyond the Frame: Single and mutilple video summarization method with user-defined length0
An Integrated System for Spatio-Temporal Summarization of 360-degrees VideosCode0
Facilitating the Production of Well-tailored Video Summaries for Sharing on Social Media0
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from VideoCode0
Video Summarization: Towards Entity-Aware CaptionsCode0
Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames0
Conditional Modeling Based Automatic Video Summarization0
Unsupervised Video Summarization via Iterative Training and Simplified GANCode0
Dynamic Non-monotone Submodular Maximization0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified