SOTAVerified

Video Summarization

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey Image credit: iJRASET

Papers

Showing 150 of 280 papers

TitleStatusHype
TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness0
MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment0
Prompts to Summaries: Zero-Shot Language-Guided Video Summarization0
Enhancing Video Memorability Prediction with Text-Motion Cross-modal Contrastive Loss and Its Application in Video Summarization0
TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations0
Unsupervised Transcript-assisted Video Summarization and Highlight Detection0
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing0
SD-VSum: A Method and Dataset for Script-Driven Video SummarizationCode0
Video Summarization with Large Language Models0
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention0
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models0
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts0
An Egocentric Vision-Language Model based Portable Real-time Smart AssistantCode2
Parameter-free Video Segmentation for Vision and Language Understanding0
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion0
Integrate the temporal scheme for unsupervised video summarization via attention mechanismCode0
Reinforcement Learning for Ultrasound Image Analysis A Comprehensive Review of Advances and Applications0
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific PresentationsCode0
FullTransNet: Full Transformer with Local-Global Attention for Video Summarization0
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning0
Do Language Models Understand Time?Code1
Video Repurposing from User Generated Content: A Large-scale Dataset and BenchmarkCode1
Agent-based Video Trimming0
Video Summarization using Denoising Diffusion Probabilistic Model0
Personalized Video Summarization by Multimodal Video Understanding0
Your Interest, Your Summaries: Query-Focused Long Video SummarizationCode0
Exploring Efficient Foundational Multi-modal Models for Video Summarization0
Realizing Video Summarization from the Path of Language-based Semantic Understanding0
Video Summarization Techniques: A Comprehensive Review0
Does SpatioTemporal information benefit Two video summarization benchmarks?Code0
EDSNet: Efficient-DSNet for Video Summarization0
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video UnderstandingCode4
Personalized Video Summarization using Text-Based Queries and Conditional Modeling0
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos0
Multimodal Language Models for Domain-Specific Procedural Video Summarization0
Unsupervised Video Summarization via Reinforcement Learning and a Trained Evaluator0
UBiSS: A Unified Framework for Bimodal Semantic Summarization of VideosCode0
A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization MethodsCode0
Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and SummarizationCode1
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal GroundingCode2
CSTA: CNN-based Spatiotemporal Attention for Video SummarizationCode0
"Previously on ..." From Recaps to Story Summarization0
An Integrated Framework for Multi-Granular Explanation of Video SummarizationCode0
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video0
Pegasus-v1 Technical Report0
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning0
VideoSAGE: Video Summarization with Graph Representation LearningCode2
Enhancing Video Summarization with Context AwarenessCode0
Cluster-based Video Summarization with Temporal Context AwarenessCode0
Scaling Up Video Summarization Pretraining with Large Language Models0
Show:102550
← PrevPage 1 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PGL-SUMF1-score (Canonical)55.6Unverified
2RR-STGF1-score (Canonical)54.5Unverified
3DSNetF1-score (Canonical)53Unverified
4VASNetF1-score (Canonical)49.71Unverified
5M-AVSF1-score (Canonical)44.4Unverified
6CSTAKendall's Tau0.25Unverified
#ModelMetricClaimedVerifiedStatus
1RR-STGF1-score (Canonical)63Unverified
2DSNetF1-score (Canonical)62.1Unverified
3VASNetF1-score (Canonical)61.42Unverified
4M-AVSF1-score (Canonical)61Unverified
5PGL-SUMF1-score (Canonical)61Unverified
6CSTAKendall's Tau0.19Unverified
#ModelMetricClaimedVerifiedStatus
1Shotluck-Holmes (3.1B)CIDEr152.3Unverified
2Shotluck-Holmes (3.1B)CIDEr63.2Unverified
3SUM-shotCIDEr8.6Unverified
#ModelMetricClaimedVerifiedStatus
1EgoVLPv2F1 (avg)52.08Unverified
2EgoVLPF1 (avg)49.72Unverified
#ModelMetricClaimedVerifiedStatus
1PGL-SUMMAP (50%)61.6Unverified
#ModelMetricClaimedVerifiedStatus
1VTSUM-BLIP1 shot Micro-F123.5Unverified