Title
Action recognition using saliency learned from recorded human gaze.
Abstract
This paper addresses the problem of recognition and localization of actions in image sequences, by utilizing, in the training phase only, gaze tracking data of people watching videos depicting the actions in question. First, we learn discriminative action features at the areas of gaze fixation and train a Convolutional Network that predicts areas of fixation (i.e. salient regions) from raw image data. Second, we propose a Support Vector Machine-based recognition method for joint recognition and localization, in which the bounding box of the action in question is considered as a latent variable. In our formulation the optimization attempts to both minimize the classification cost and maximize the saliency within the bounding box. We show that the results obtained with the optimization where saliency within the bounding box is maximized outperform the results obtained when saliency within the bounding box is not maximized, i.e. when only classification cost is minimized. Furthermore, the results that we obtain outperform the state-of-the-art results on the UCF sports dataset. Display Omitted 3D CNN action features learned using gaze information outperform handcrafted features.Using latent variables that localize the action in an SVM framework is beneficial.Using saliency learned from gaze in a latent SVM framework is beneficial.
Year
DOI
Venue
2016
10.1016/j.imavis.2016.06.006
Image Vision Comput.
Keywords
Field
DocType
Action recognition,Saliency,Support Vector Machine (SVM),Latent variable,3D Convolutional Neural Network (3D CNN)
Computer vision,Gaze,Pattern recognition,Salience (neuroscience),Computer science,Action recognition,Support vector machine,Latent variable,Artificial intelligence,Discriminative model,Salient,Minimum bounding box
Journal
Volume
Issue
ISSN
52
C
0262-8856
Citations 
PageRank 
References 
3
0.45
0
Authors
2
Name
Order
Citations
PageRank
Daria Stefic130.45
Ioannis Patras21960123.15