Title
Enhanced Soft Attention Mechanism With An Inception-Like Module For Image Captioning
Abstract
Visual soft attention has been widely adopted in image captioning models. Traditional Soft Attention Mechanism (TSAM) assigns a weight to a certain region by using a multi-layer perceptron with input from its own features. As image classification networks extract regional features based on spatial locations, TSAM fails to adequately consider the spatial contexts of regions, which leads to unreasonable weight distribution. In this paper, we introduce a flexible and universal attention framework with an inception-like module, named Enhanced Soft Attention Mechanism (ESAM), which can balance the attention levels of adjacent regions and alleviate the problem caused by local features with weak representational ability. Furthermore, we add an LSTM to the attention module so that it can take into account the previous attention distribution while generating the current word. Experimental results show that our ESAM significantly surpasses the TSAM by 4.1% on BLEU-4 and 2.7% on CIDEr, and achieves better results when verifying universality under the same experimental setups.
Year
DOI
Venue
2020
10.1109/ICTAI50040.2020.00119
2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)
Keywords
DocType
ISSN
image captioning, soft attention, inception
Conference
1082-3409
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Zheng Lian1128.33
Haichang Li202.03
Rui Wang313953.65
Xiaohui Hu4178.10