Title
Improving image captioning with Pyramid Attention and SC-GAN
Abstract
Most of the existing image captioning models mainly use global attention, which represents the whole image fea-tures, local attention, representing the object features, or a combination of them; there are few models to inte-grate the relationship information between various object regions of the image. But this relationship information is also very instructive for caption generation. For example, if a football appears, there is a high prob-ability that the image also contains people near the football. In this article, the relationship feature is embedded into the global-local attention to constructing a new Pyramid Attention mechanism, which can explore the inter-nal visual and semantic relationship between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we propose a new method to apply the Generative Adver-sarial Network into sequence generation. The greedy decoding method is used to generate an efficient baseline reward for self-critical training. Finally, experiments on MSCOCO dataset show that the model can generate more accurate and vivid captions and outperforms many recent advanced models in various prevailing evalua-tion metrics on both local and online test sets.(c) 2021 Elsevier B.V. All rights reserved.
Year
DOI
Venue
2022
10.1016/j.imavis.2021.104340
Image and Vision Computing
Keywords
DocType
Volume
Image captioning,Pyramid Attention network,Self-critical training,Reinforcement learning,Generative adversarial network,Sequence-level learning
Journal
117
ISSN
Citations 
PageRank 
0262-8856
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Tianyu Chen100.34
Zhixin Li200.68
Jingli Wu333.15
Huifang Ma401.35
Bianping Su500.34