StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

2018-11-05Code Available0· sign in to hype

Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

Code Available — Be the first to reproduce this paper.

Code

github.com/hyperfraise/StNet
pytorch★ 0
github.com/2023-MindSpore-1/ms-code-217/tree/main/stnet
mindspore★ 0
github.com/kingcong/stnet
mindspore★ 0
github.com/mindspore-ai/models/tree/master/research/cv/stnet
mindspore★ 0
github.com/BigLazyPig/Pytorch-StNet-Full-Implement
pytorch★ 0
github.com/MindSpore-paper-code-2/code3/tree/main/stnet
mindspore★ 0
github.com/2023-MindSpore-4/Code7/tree/main/stnet
mindspore★ 0
github.com/hyperfraise/Pytorch-StNet
pytorch★ 0

Abstract

Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos. Particularly, StNet stacks N successive video frames into a super-image which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatial-temporal relationship, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet. It employs a separate channel-wise and temporal-wise convolution over the feature sequence of video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.

Tasks

Action Recognition Temporal Action Localization

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Code

Abstract

Tasks

Reproductions