Depthwise Separable Temporal Convolutional Network for Action Segmentation

2021-01-192020 International Conference on 3D Vision (3DV) 2021Unverified0· sign in to hype

Basavaraj Hampiholi, Christian Jarvers, Wolfgang Mader, Heiko Neumann

Unverified — Be the first to reproduce this paper.

Abstract

Fine-grained temporal action segmentation in long, untrimmed RGB videos is a key topic in visual human- machine interaction. Recent temporal convolution based approaches either use encoder-decoder(ED) architecture or dilations with doubling factor in consecutive convolution layers to segment actions in videos. However ED networks operate on low temporal resolution and the dilations in suc- cessive layers cause gridding artifacts problem. We propose depthwise separable temporal convolution network (DS- TCN) that operates on full temporal resolution and with re- duced gridding effects. The basic component of DS-TCN is residual depthwise dilated block (RDDB). We explore the trade-off between large kernels and small dilation rates us- ing RDDB. We show that our DS-TCN is capable of captur- ing long-term dependencies as well as local temporal cues efficiently. Our evaluation on three benchmark datasets, GTEA, 50Salads, and Breakfast demonstrates that DS-TCN outperforms the existing ED-TCN and dilation based TCN baselines even with comparatively fewer parameters.

Tasks

Action Segmentation Decoder Temporal Action Segmentation

Depthwise Separable Temporal Convolutional Network for Action Segmentation

Abstract

Tasks

Reproductions