Improving image captioning with Pyramid Attention and SC-GAN - Citegraph

Paper Info

Title
Improving image captioning with Pyramid Attention and SC-GAN

Abstract
Most of the existing image captioning models mainly use global attention, which represents the whole image fea-tures, local attention, representing the object features, or a combination of them; there are few models to inte-grate the relationship information between various object regions of the image. But this relationship information is also very instructive for caption generation. For example, if a football appears, there is a high prob-ability that the image also contains people near the football. In this article, the relationship feature is embedded into the global-local attention to constructing a new Pyramid Attention mechanism, which can explore the inter-nal visual and semantic relationship between different object regions. Besides, to alleviate the exposure bias problem and make the training process more efficient, we propose a new method to apply the Generative Adver-sarial Network into sequence generation. The greedy decoding method is used to generate an efficient baseline reward for self-critical training. Finally, experiments on MSCOCO dataset show that the model can generate more accurate and vivid captions and outperforms many recent advanced models in various prevailing evalua-tion metrics on both local and online test sets.(c) 2021 Elsevier B.V. All rights reserved.

Year	DOI	Venue
2022	10.1016/j.imavis.2021.104340	Image and Vision Computing
Keywords	DocType	Volume
Image captioning,Pyramid Attention network,Self-critical training,Reinforcement learning,Generative adversarial network,Sequence-level learning	Journal	117
ISSN	Citations	PageRank
0262-8856	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tianyu Chen	1	0	0.34
Zhixin Li	2	0	0.68
Jingli Wu	3	3	3.15
Huifang Ma	4	0	1.35
Bianping Su	5	0	0.34

1