Title
First Person Action-Object Detection with EgoNet.
Abstract
Objects afford visual sensation and motor actions. A first person camera, placed at the person's head, captures unscripted moments of our visual sensorimotor object interactions. Can a single first person image tell us about our momentary visual attention and motor action with objects, without a gaze tracking device or tactile sensors? To study the holistic correlation of visual attention with motor action, we introduce the concept of action-objects---objects associated with seeing and touching actions, which exhibit characteristic 3D spatial distance and orientation with respect to the person. A predictive action-object model is designed to re-organize the space of interactions in terms of visual and tactile sensations, which is realized by our proposed EgoNet network. EgoNet is composed of two convolutional neural networks: 1) Semantic Gaze Pathway that learns 2D appearance cues with first person coordinate embedding, and 2) 3D Spatial Pathway that focuses on 3D depth and height measurements relative to the person with brightness reflectance attached. Retaining two distinct pathways enables effective learning from a limited number of examples, diversified prediction from complementary visual signals, and flexible architecture that is functional with RGB image without depth information. We show that our model correctly predicts action-objects in a first person image where we outperform the existing approaches across different datasets.
Year
Venue
DocType
2016
CoRR
Journal
Volume
Citations 
PageRank 
abs/1603.04908
3
0.37
References 
Authors
38
4
Name
Order
Citations
PageRank
Gedas Bertasius116910.38
Hyun Soo Park2708.55
Yu, Stella X.387786.36
Jianbo Shi4102071031.66