Title
Learning Image Representations Tied to Egomotion from Unlabeled Video.
Abstract
Understanding how images of objects and scenes behave in response to specific egomotions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose a new “embodied” visual learning paradigm, exploiting proprioceptive motor signals to train visual representations from egocentric video with no manual supervision. Specifically, we enforce that our learned features exhibit equivariance i.e., they respond predictably to transformations associated with distinct egomotions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain.
Year
DOI
Venue
2017
https://doi.org/10.1007/s11263-017-1001-2
International Journal of Computer Vision
Keywords
Field
DocType
Feature Space,Convolutional Neural Network,Feature Learning,Temporal Coherence,Scene Recognition
Computer vision,Feature vector,Disjoint sets,Convolutional neural network,Computer science,Embodied cognition,Visual recognition,Artificial intelligence,Visual learning,Feature learning,Machine learning
Journal
Volume
Issue
ISSN
125
1-3
0920-5691
Citations 
PageRank 
References 
6
0.68
44
Authors
2
Name
Order
Citations
PageRank
Dinesh Jayaraman131815.69
Kristen Grauman26258326.34