SOTAVerified

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

2016-08-02Code Available0· sign in to hype

Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP 93.23\%) on the testing set and secured the first place in the challenge.

Tasks

Reproductions