SOTAVerified

Supervised Video Summarization

Supervised video summarization rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the SumMe dataset, or in the form of frame-level importance scores, as in the case of the TVSum dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization.

Source: Video Summarization Using Deep Neural Networks: A Survey

Papers

Showing 110 of 28 papers

TitleStatusHype
TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness0
FullTransNet: Full Transformer with Local-Global Attention for Video Summarization0
CSTA: CNN-based Spatiotemporal Attention for Video SummarizationCode0
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video0
Align and Attend: Multimodal Summarization with Dual Contrastive LossesCode1
Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization0
Progressive Video Summarization via Multimodal Self-supervised LearningCode1
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer0
Video Joint Modelling Based on Hierarchical Transformer for Co-summarizationCode1
Combining Global and Local Attention with Positional Encoding for Video SummarizationCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.