First Person Action-Object Detection with EgoNet. - Citegraph

Paper Info

Title
First Person Action-Object Detection with EgoNet.

Abstract
Objects afford visual sensation and motor actions. A first person camera, placed at the person's head, captures unscripted moments of our visual sensorimotor object interactions. Can a single first person image tell us about our momentary visual attention and motor action with objects, without a gaze tracking device or tactile sensors? To study the holistic correlation of visual attention with motor action, we introduce the concept of action-objects---objects associated with seeing and touching actions, which exhibit characteristic 3D spatial distance and orientation with respect to the person. A predictive action-object model is designed to re-organize the space of interactions in terms of visual and tactile sensations, which is realized by our proposed EgoNet network. EgoNet is composed of two convolutional neural networks: 1) Semantic Gaze Pathway that learns 2D appearance cues with first person coordinate embedding, and 2) 3D Spatial Pathway that focuses on 3D depth and height measurements relative to the person with brightness reflectance attached. Retaining two distinct pathways enables effective learning from a limited number of examples, diversified prediction from complementary visual signals, and flexible architecture that is functional with RGB image without depth information. We show that our model correctly predicts action-objects in a first person image where we outperform the existing approaches across different datasets.

Year	Venue	DocType
2016	CoRR	Journal
Volume	Citations	PageRank
abs/1603.04908	3	0.37
References	Authors
38	4

Authors (4 rows)

Cited by (3 rows)

References (38 rows)

Name	Order	Citations	PageRank
Gedas Bertasius	1	169	10.38
Hyun Soo Park	2	70	8.55
Yu, Stella X.	3	877	86.36
Jianbo Shi	4	10207	1031.66

1