SOTAVerified

A Closer Look at Spatiotemporal Convolutions for Action Recognition

2017-11-30CVPR 2018Code Available1· sign in to hype

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann Lecun, Manohar Paluri

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly advantages in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which gives rise to CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101 and HMDB51.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
HMDB-51R[2+1]D-Flow (Kinetics pretrained)Average accuracy of 3 splits76.4Unverified
HMDB-51R[2+1]D-TwoStream (Kinetics pretrained)Average accuracy of 3 splits78.7Unverified
HMDB-51R[2+1]D-RGB (Kinetics pretrained)Average accuracy of 3 splits74.5Unverified
HMDB-51R[2+1D]D-TwoStream (Sports1M pretrained)Average accuracy of 3 splits72.7Unverified
HMDB-51R[2+1]D-Flow (Sports1M pretrained)Average accuracy of 3 splits70.1Unverified
HMDB-51R[2+1]D-RGB (Sports1M pretrained)Average accuracy of 3 splits66.6Unverified
Sports-1MR[2+1]D-Flow-32frameVideo hit@1 68.4Unverified
Sports-1MR[2+1]D-Two-Stream-32frameVideo hit@1 73.3Unverified
Sports-1MR[2+1]D-RGB-32frameVideo hit@1 73Unverified
UCF101R[2+1]D-TwoStream (Kinetics pretrained)3-fold Accuracy97.3Unverified
UCF101R[2+1]D-RGB (Kinetics pretrained)3-fold Accuracy96.8Unverified
UCF101R[2+1]D-TwoStream (Sports-1M pretrained)3-fold Accuracy95Unverified
UCF101R[2+1]D-Flow (Kinetics pretrained)3-fold Accuracy95.5Unverified
UCF101R[2+1]D-RGB (Sports-1M pretrained)3-fold Accuracy93.6Unverified
UCF101R[2+1]D-Flow (Sports-1M pretrained)3-fold Accuracy93.3Unverified

Reproductions