Title
Collaborative strategy network for spatial attention image captioning
Abstract
Automatic image captioning is an interesting task that lies at the intersection of computer vision and natural language processing. Although image captioning based on reinforcement learning has made significant progress in the past few years, the problem of inconsistent evaluation indicators for training and testing remains. Reinforcement learning optimizes a single metric, and the caption generated by the model is monotonous and non-characteristics. The model cannot reflect the diversity among images. In response to the above problems, we design a novel image captioning model based on lightweight spatial attention and a generative adversarial network. The lightweight spatial attention module discards the coarse-grained approach of maximum pooling after convolution and transforms the spatial information to preserve key information in the feature map. Then, the game mechanism between the generator and the discriminator is used to optimize the evaluation metric of the model. Finally, we design a discriminator network that cooperates with reinforcement learning to update the model parameters and objectively optimize the language metric inconsistencies between the evaluation and test indicators. We verified the effectiveness of the proposed model on the MS-COCO and Flickr 30K datasets. The experimental results show that the model proposed in this paper achieves state-of-the-art results.
Year
DOI
Venue
2022
10.1007/s10489-021-02943-w
Applied Intelligence
Keywords
DocType
Volume
Generative adversarial network, Attention mechanism, Image captioning, Reinforcement learning
Journal
52
Issue
ISSN
Citations 
8
0924-669X
0
PageRank 
References 
Authors
0.34
15
3
Name
Order
Citations
PageRank
Dongming Zhou123.40
Jing Yang200.68
Riqiang Bao300.34