SOTAVerified

Supervised Video Summarization

Supervised video summarization rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the SumMe dataset, or in the form of frame-level importance scores, as in the case of the TVSum dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization.

Source: Video Summarization Using Deep Neural Networks: A Survey

Papers

Showing 110 of 28 papers

TitleStatusHype
Video Joint Modelling Based on Hierarchical Transformer for Co-summarizationCode1
DSNet: A Flexible Detect-to-Summarize Network for Video SummarizationCode1
Supervised Video Summarization via Multiple Feature Sets with Parallel AttentionCode1
Self-Attention Recurrent Summarization Network with Reinforcement Learning for Video Summarization TaskCode1
Progressive Video Summarization via Multimodal Self-supervised LearningCode1
Align and Attend: Multimodal Summarization with Dual Contrastive LossesCode1
Combining Global and Local Attention with Positional Encoding for Video SummarizationCode1
Discriminative Feature Learning for Unsupervised Video SummarizationCode0
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness RewardCode0
CLIP-It! Language-Guided Video SummarizationCode0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.