Discovering hierarchical object models from captioned images - Citegraph

Paper Info

Title
Discovering hierarchical object models from captioned images

Abstract
We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects.

Year	DOI	Venue
2012	10.1016/j.cviu.2012.03.002	Computer Vision and Image Understanding
Keywords	Field	DocType
associated caption,captioned image,hierarchical configuration,better localization,object model,previous work,subsequent image annotation,poor object localization,perceptual grouping,hierarchical object model,greater spatial extent,previous framework,automatic image annotation,object recognition	Computer science,Learning object,Artificial intelligence,Ambiguity,Computer vision,Annotation,Automatic image annotation,Pattern recognition,Precision and recall,Invariant (mathematics),Machine learning,Cognitive neuroscience of visual object recognition,Bounding overwatch	Journal
Volume	Issue	ISSN
116	7	1077-3142
Citations	PageRank	References
1	0.40	23
Authors
5

Authors (5 rows)

Cited by (1 rows)

References (23 rows)

Name	Order	Citations	PageRank
Michael Jamieson	1	29	3.72
Yulia Eskin	2	3	1.48
Afsaneh Fazly	3	213	26.99
Suzanne Stevenson	4	566	64.31
Sven J. Dickinson	5	2836	185.12

1