Abstract | ||
---|---|---|
We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1016/j.cviu.2012.03.002 | Computer Vision and Image Understanding |
Keywords | Field | DocType |
associated caption,captioned image,hierarchical configuration,better localization,object model,previous work,subsequent image annotation,poor object localization,perceptual grouping,hierarchical object model,greater spatial extent,previous framework,automatic image annotation,object recognition | Computer science,Learning object,Artificial intelligence,Ambiguity,Computer vision,Annotation,Automatic image annotation,Pattern recognition,Precision and recall,Invariant (mathematics),Machine learning,Cognitive neuroscience of visual object recognition,Bounding overwatch | Journal |
Volume | Issue | ISSN |
116 | 7 | 1077-3142 |
Citations | PageRank | References |
1 | 0.40 | 23 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michael Jamieson | 1 | 29 | 3.72 |
Yulia Eskin | 2 | 3 | 1.48 |
Afsaneh Fazly | 3 | 213 | 26.99 |
Suzanne Stevenson | 4 | 566 | 64.31 |
Sven J. Dickinson | 5 | 2836 | 185.12 |