Title | ||
---|---|---|
Be Specific, Be Clear: Bridging Machine and Human Captions by Scene-Guided Transformer |
Abstract | ||
---|---|---|
ABSTRACTAutomatically generating natural language descriptions for images, i.e., image captioning, is one of the primary goals for multimedia understanding. The recent success of deep neural networks in image captioning has been accompanied by region-based bottom-up-attention features. Region-based features are representative of the contents of local regions while lacking an overall understanding of images, which is critical to more specific and clear language expression. Visual scene perception can facilitate overall understanding and provide prior knowledge to generate specific and clear captions of objects, object relations, and overall image scenes. In this paper, we propose a Scene-Guided Transformer (SG-Transformer) model that leverages the scene-level global context to generate more specific and descriptive image captions. SG-Transformer adopts an encoder-decoder architecture. The encoder aggregates global scene context as external knowledge with object region-based features in attention learning to facilitate object relation reasoning. It also incorporates high-level auxiliary scene-guided tasks towards more specific visual representation learning. Then the decoder integrates both object-level and scene-level information refined by the encoder for an overall image perception. Extensive experiments on MSCOCO and Flickr30k benchmarks show the superiority and generality of SG-Transformer. Besides, the proposed scene-guided approach can enrich object-level and scene graph visual representations in the encoder and generalize to both RNN- and Transformer-based architectures in the decoder. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3463945.3469054 | International Multimedia Conference |
Keywords | DocType | Citations |
image captioning, scene, context | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yupan Huang | 1 | 0 | 1.35 |
Zhaoyang Zeng | 2 | 1 | 2.06 |
Yutong Lu | 3 | 307 | 53.61 |