Abstract | ||
---|---|---|
Inspired by the work in human translation, when translating a sentence, we can not generate a word without looking back at the previous words of the sentence. In addition, generating a sentence for an image needs spatial information. In this paper, we address a novel spatial-temporal attention approach which combines previous, current and visual information. To get a more correct sentence for an image, our model decides whether the spatial or temporal information is more important during word generation. In the experiment, we verify our method on the most popular dataset: Microsoft COCO. The results show that our method performs well. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/BigMM.2018.8499060 | 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) |
Keywords | Field | DocType |
Spatial-temporal,Attention,Image captioning | Spatial analysis,Closed captioning,Computer science,Feature extraction,Natural language processing,Artificial intelligence,Decoding methods,Sentence,Semantics | Conference |
ISBN | Citations | PageRank |
978-1-5386-5322-7 | 0 | 0.34 |
References | Authors | |
1 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Junwei Zhou | 1 | 118 | 16.64 |
Xi Wang | 2 | 0 | 1.69 |
Jizhong Han | 3 | 355 | 54.72 |
Songlin Hu | 4 | 126 | 30.82 |
Hongchao Gao | 5 | 0 | 2.70 |