MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

2019-03-05CVPR 2019Code Available0· sign in to hype

Yazan Abu Farha, Juergen Gall

Code Available — Be the first to reproduce this paper.

Code

github.com/yabufarha/ms-tcn
OfficialIn paperpytorch★ 0
github.com/MindSpore-paper-code-2/code3/tree/main/TCN
mindspore★ 0

Abstract

Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating frame-wise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.

Tasks

Action Segmentation Segmentation Temporal Action Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
50 Salads	MS-TCN	F1@50%	64.5	—	Unverified
Breakfast	MS-TCN (IDT)	Average F1	50.6	—	Unverified
Breakfast	MS-TCN (I3D)	Average F1	46.2	—	Unverified
GTEA	MS-TCN	F1@50%	74.6	—	Unverified

MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions