Title
Discovering hierarchical object models from captioned images
Abstract
We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.
Year
DOI
Venue
2012
10.1016/j.cviu.2012.03.002
Computer Vision and Image Understanding
Keywords
Field
DocType
associated caption,captioned image,hierarchical configuration,better localization,object model,previous work,subsequent image annotation,poor object localization,perceptual grouping,hierarchical object model,greater spatial extent,previous framework,automatic image annotation,object recognition
Computer science,Learning object,Artificial intelligence,Ambiguity,Computer vision,Annotation,Automatic image annotation,Pattern recognition,Precision and recall,Invariant (mathematics),Machine learning,Cognitive neuroscience of visual object recognition,Bounding overwatch
Journal
Volume
Issue
ISSN
116
7
1077-3142
Citations 
PageRank 
References 
1
0.40
23
Authors
5
Name
Order
Citations
PageRank
Michael Jamieson1293.72
Yulia Eskin231.48
Afsaneh Fazly321326.99
Suzanne Stevenson456664.31
Sven J. Dickinson52836185.12