Title
Deep Hand: How To Train A Cnn On 1 Million Hand Images When Your Data Is Continuous And Weakly Labelled
Abstract
This work presents a new approach to learning a frame-based classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. This allows the CNN to be trained on a vast number of example images when only loose sequence level information is available for the source videos. Although we demonstrate this in the context of hand shape recognition, the approach has wider application to any video recognition task where frame level labelling is not available. The iterative EM algorithm leverages the discriminative ability of the CNN to iteratively refine the frame level annotation and subsequent training of the CNN. By embedding the classifier within an EM framework the CNN can easily be trained on 1 million hand images. We demonstrate that the final classifier generalises over both individuals and data sets. The algorithm is evaluated on over 3000 manually labelled hand shape images of 60 different classes which will be released to the community. Furthermore, we demonstrate its use in continuous sign language recognition on two publicly available large sign language data sets, where it outperforms the current state-of-the-art by a large margin. To our knowledge no previous work has explored expectation maximization without Gaussian mixture models to exploit weak sequence labels for sign language recognition.
Year
DOI
Venue
2016
10.1109/CVPR.2016.412
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
Field
DocType
Volume
Computer vision,Data set,Embedding,Annotation,Pattern recognition,Expectation–maximization algorithm,Computer science,Sign language,Artificial intelligence,Classifier (linguistics),Discriminative model,Mixture model
Conference
2016
Issue
ISSN
Citations 
1
1063-6919
9
PageRank 
References 
Authors
0.52
12
3
Name
Order
Citations
PageRank
Oscar Koller11289.02
Hermann Ney2141781506.93
Richard Bowden31840118.50