Integrating Scene Semantic Knowledge into Image Captioning - Citegraph

Paper Info

Title
Integrating Scene Semantic Knowledge into Image Captioning

Abstract
AbstractMost existing image captioning methods use only the visual information of the image to guide the generation of captions, lack the guidance of effective scene semantic information, and the current visual attention mechanism cannot adjust the focus intensity on the image. In this article, we first propose an improved visual attention model. At each timestep, we calculated the focus intensity coefficient of the attention mechanism through the context information of the model, then automatically adjusted the focus intensity of the attention mechanism through the coefficient to extract more accurate visual information. In addition, we represented the scene semantic knowledge of the image through topic words related to the image scene, then added them to the language model. We used the attention mechanism to determine the visual information and scene semantic information that the model pays attention to at each timestep and combined them to enable the model to generate more accurate and scene-specific captions. Finally, we evaluated our model on Microsoft COCO (MSCOCO) and Flickr30k standard datasets. The experimental results show that our approach generates more accurate captions and outperforms many recent advanced models in various evaluation metrics.

Year	DOI	Venue
2021	10.1145/3439734	ACM Transactions on Multimedia Computing, Communications, and Applications
Keywords	DocType	Volume
Image captioning, attention mechanism, scene semantics, encoder-decoder framework	Journal	17
Issue	ISSN	Citations
2	1551-6857	2
PageRank	References	Authors
0.41	0	6

Authors (6 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Haiyang Wei	1	4	1.11
Zhixin Li	2	12	19.62
Feicheng Huang	3	4	1.81
Canlong Zhang	4	6	2.51
Huifang Ma	5	290	29.69
Zhongzhi Shi	6	9	1.23

1