Enhanced Soft Attention Mechanism With An Inception-Like Module For Image Captioning - Citegraph

Paper Info

Title
Enhanced Soft Attention Mechanism With An Inception-Like Module For Image Captioning

Abstract
Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multi-layer perceptron with input from its own features. As image classification networks extract regional features based on spatial locations, TSAM fails to adequately consider the spatial contexts of regions, which leads to unreasonable weight distribution. In this paper, we introduce a flexible and universal attention framework with an inception-like module, named Enhanced Soft Attention Mechanism (ESAM), which can balance the attention levels of adjacent regions and alleviate the problem caused by local features with weak representational ability. Furthermore, we add an LSTM to the attention module so that it can take into account the previous attention distribution while generating the current word. Experimental results show that our ESAM significantly surpasses the TSAM by 4.1% on BLEU-4 and 2.7% on CIDEr, and achieves better results when verifying universality under the same experimental setups.

Year	DOI	Venue
2020	10.1109/ICTAI50040.2020.00119	2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)
Keywords	DocType	ISSN
image captioning, soft attention, inception	Conference	1082-3409
Citations	PageRank	References
0	0.34	0
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Zheng Lian	1	12	8.33
Haichang Li	2	0	2.03
Rui Wang	3	139	53.65
Xiaohui Hu	4	17	8.10

1