Cross-Enhancement Transformer for Action Segmentation

2022-05-19Code Available0· sign in to hype

Jiahui Wang, Zhenyou Wang, Shanna Zhuang, Hui Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/Wangjhdeveloper/CETNet
Officialpytorch★ 3

Abstract

Temporal convolutions have been the paradigm of choice in action segmentation, which enhances long-term receptive fields by increasing convolution layers. However, high layers cause the loss of local information necessary for frame recognition. To solve the above problem, a novel encoder-decoder structure is proposed in this paper, called Cross-Enhancement Transformer. Our approach can be effective learning of temporal structure representation with interactive self-attention mechanism. Concatenated each layer convolutional feature maps in encoder with a set of features in decoder produced via self-attention. Therefore, local and global information are used in a series of frame actions simultaneously. In addition, a new loss function is proposed to enhance the training process that penalizes over-segmentation errors. Experiments show that our framework performs state-of-the-art on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities and the Breakfast dataset.

Tasks

Action Segmentation Decoder Segmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
50 Salads	CETNet	F1@50%	80.1	—	Unverified
Breakfast	CETNet	Average F1	71.8	—	Unverified
GTEA	CETNet	F1@50%	81.3	—	Unverified

Cross-Enhancement Transformer for Action Segmentation

Code

Abstract

Tasks

Benchmark Results

Reproductions