Temporal Aggregate Representations for Long-Range Video Understanding

2020-06-01ECCV 2020Code Available1· sign in to hype

Fadime Sener, Dipika Singhania, Angela Yao

Code Available — Be the first to reproduce this paper.

Code

github.com/dibschat/tempAgg
Officialpytorch★ 11
github.com/dipika-singhania/multi-scale-action-banks
pytorch★ 6

Abstract

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.

Tasks

Action Anticipation Action Recognition Future prediction Video Segmentation Video Semantic Segmentation Video Understanding

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Assembly101	TempAgg	Verbs Recall@5	59.11	—	Unverified

Temporal Aggregate Representations for Long-Range Video Understanding

Code

Abstract

Tasks

Benchmark Results

Reproductions