Title
Spatial- Temporal Attention for Image Captioning
Abstract
Inspired by the work in human translation, when translating a sentence, we can not generate a word without looking back at the previous words of the sentence. In addition, generating a sentence for an image needs spatial information. In this paper, we address a novel spatial-temporal attention approach which combines previous, current and visual information. To get a more correct sentence for an image, our model decides whether the spatial or temporal information is more important during word generation. In the experiment, we verify our method on the most popular dataset: Microsoft COCO. The results show that our method performs well.
Year
DOI
Venue
2018
10.1109/BigMM.2018.8499060
2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
Keywords
Field
DocType
Spatial-temporal,Attention,Image captioning
Spatial analysis,Closed captioning,Computer science,Feature extraction,Natural language processing,Artificial intelligence,Decoding methods,Sentence,Semantics
Conference
ISBN
Citations 
PageRank 
978-1-5386-5322-7
0
0.34
References 
Authors
1
5
Name
Order
Citations
PageRank
Junwei Zhou111816.64
Xi Wang201.69
Jizhong Han335554.72
Songlin Hu412630.82
Hongchao Gao502.70