Collaborative strategy network for spatial attention image captioning - Citegraph

Paper Info

Title
Collaborative strategy network for spatial attention image captioning

Abstract
Automatic image captioning is an interesting task that lies at the intersection of computer vision and natural language processing. Although image captioning based on reinforcement learning has made significant progress in the past few years, the problem of inconsistent evaluation indicators for training and testing remains. Reinforcement learning optimizes a single metric, and the caption generated by the model is monotonous and non-characteristics. The model cannot reflect the diversity among images. In response to the above problems, we design a novel image captioning model based on lightweight spatial attention and a generative adversarial network. The lightweight spatial attention module discards the coarse-grained approach of maximum pooling after convolution and transforms the spatial information to preserve key information in the feature map. Then, the game mechanism between the generator and the discriminator is used to optimize the evaluation metric of the model. Finally, we design a discriminator network that cooperates with reinforcement learning to update the model parameters and objectively optimize the language metric inconsistencies between the evaluation and test indicators. We verified the effectiveness of the proposed model on the MS-COCO and Flickr 30K datasets. The experimental results show that the model proposed in this paper achieves state-of-the-art results.

Year	DOI	Venue
2022	10.1007/s10489-021-02943-w	Applied Intelligence
Keywords	DocType	Volume
Generative adversarial network, Attention mechanism, Image captioning, Reinforcement learning	Journal	52
Issue	ISSN	Citations
8	0924-669X	0
PageRank	References	Authors
0.34	15	3

Authors (3 rows)

Cited by (0 rows)

References (15 rows)

Name	Order	Citations	PageRank
Dongming Zhou	1	2	3.40
Jing Yang	2	0	0.68
Riqiang Bao	3	0	0.34

1