Summarizing Videos with Attention
Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ok1zjf/VASNetOfficialIn paperpytorch★ 0
- github.com/thswodnjs3/CSTApytorch★ 68
- github.com/590shun/summarizerpytorch★ 0
- github.com/VinACE/trans-vsummpytorch★ 0
- github.com/azhar0100/VASNetpytorch★ 0
Abstract
In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| SumMe | VASNet | F1-score (Canonical) | 49.71 | — | Unverified |
| TvSum | VASNet | F1-score (Canonical) | 61.42 | — | Unverified |