Abstract | ||
---|---|---|
We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than 1 MB in size, which is less than one hundredth of our initial model, with a drop of 7% in accuracy, and is suitable for real-time gesture recognition on mobile devices. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/icip.2017.8297033 | 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) |
DocType | Volume | ISSN |
Conference | abs/1712.10136 | 1522-4880 |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Koustav Mullick | 1 | 0 | 0.34 |
Anoop M. Namboodiri | 2 | 255 | 26.36 |