Title
A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective
Abstract
A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as < subject-predicate-object >, and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical recurrent neural network with visual attention mechanism. This model is composed of two attention-based recurrent neural networks in a hierarchical organization. The first network generates a topic vector for each relationship triplet, whereas the second network predicts each word in that relationship triplet given the topic vector. This approach successfully captures the compositional structure and contextual dependency of an image and the relationship triplets describing its scene. For scene graph construction, an entity localization approach to determine the graph structure is presented with the assistance of available attention information. Then, the procedures for automatically converting the generated relationship triplets into a scene graph are clarified through an algorithm. Extensive experimental results on two widely used data sets verify the feasibility of the proposed approach.
Year
DOI
Venue
2019
10.1111/coin.12202
COMPUTATIONAL INTELLIGENCE
Keywords
Field
DocType
hierarchical recurrent neural network,relationship triplet recognition,scene graph,visual attention mechanism
Graph,Scene graph,Computer science,Speech recognition,Visual attention
Journal
Volume
Issue
ISSN
35.0
SP3.0
0824-7935
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Wenjing Gao101.01
Yonghua Zhu201.35
Wenjun Zhang31789177.28
Ke Zhang401.01
Honghao Gao521745.24