Learning Deep and Compact Models for Gesture Recognition
Koustav Mullick, Anoop M. Namboodiri
Code Available — Be the first to reproduce this paper.
ReproduceCode
Abstract
We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than 1~MB in size, which is less than one hundredth of our initial model, with a drop of 7\% in accuracy, and is suitable for real-time gesture recognition on mobile devices.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Chalearn 2014 | 3D-CNN + LSTM | Accuracy | 93.2 | — | Unverified |