Title
Unsupervised Learning using Sequential Verification for Action Recognition.
Abstract
In this paper, we consider the problem of learning a visual representation from the raw spatiotemporal signals in videos for use in action recognition. Our representation is learned without supervision from semantic labels. We formulate it as an unsupervised sequential verification task, i.e., we determine whether a sequence of frames from a video is in the correct temporal order. With this simple task and no semantic labels, we learn a powerful unsupervised representation using a Convolutional Neural Network (CNN). The representation contains complementary information to that learned from supervised image datasets like ImageNet. Qualitative results show that our method captures information that is temporally varying, such as human pose. When used as pre-training for action recognition, our method gives significant gains over learning without external data on benchmark datasets like UCF101 and HMDB51. Our method can also be combined with supervised representations to provide an additional boost in accuracy for action recognition. Finally, to quantify its sensitivity to human pose, we show results for human pose estimation on the FLIC dataset that are competitive with approaches using significantly more supervised training data.
Year
Venue
Field
2016
arXiv: Computer Vision and Pattern Recognition
Pattern recognition,Convolutional neural network,Computer science,Action recognition,Pose,Unsupervised learning,Supervised training,Artificial intelligence,Machine learning
DocType
Volume
Citations 
Journal
abs/1603.08561
2
PageRank 
References 
Authors
0.36
0
3
Name
Order
Citations
PageRank
Ishan Misra120112.69
C. Lawrence Zitnick27321332.72
Martial Hebert3112771146.89