Abstract | ||
---|---|---|
Image captioning is a difficult problem which aims to automatically describe the content of an image using proper textual descriptions. Especially for Chinese captioning, due to its complex semantic and various expressions, it is still a challenging task of improving the caption quality. In this paper, we extend research on automated image captioning in the dimension of language and propose a novel Chinese image captioning model which uses the double-layer LSTM with an attention mechanism to generate more natural Chinese sentences. In our model, the Inception-v3 network is employed to extract image features, and the weights of these features are determined by the attention mechanism based on double-layer LSTM to predict each word. Experimental results on AIC-ICC datasets demonstrate that our proposed model can generate better Chinese captions, which are more accurate and fluent. Compared with traditional Chinese image captioning algorithms, our method greatly improves the performance of captioning and achieves BLEU-4 and CIDEr evaluation scores of 40.2 and 119.9, respectively. The actual generation results also show that the model can generate accurate, diverse, and vivid Chinese caption of images. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/IJCNN52387.2021.9533463 | 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) |
Keywords | DocType | ISSN |
Chinese image captioning, attention model, LSTM | Conference | 2161-4393 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wu Wei | 1 | 204 | 14.84 |
Deshuai Sun | 2 | 0 | 0.34 |