Watch What You Just Said: Image Captioning with Text-Conditional Attention. - Citegraph

Paper Info

Title
Watch What You Just Said: Image Captioning with Text-Conditional Attention.

Abstract
Attention mechanisms have attracted considerable interest in image captioning due to their powerful performance. However, existing methods use only visual content as attention and whether textual context can improve attention in image captioning remains unsolved. To explore this problem, we propose a novel attention mechanism, called text-conditional attention, which allows the caption generator to focus on certain image features given previously generated text. To obtain text-related image features for our attention model, we adopt the guiding Long Short-Term Memory (gLSTM) captioning architecture with CNN fine-tuning. Our proposed method allows joint learning of the image embedding, text embedding, text-conditional attention and language model with one network architecture in an end-to-end manner. We perform extensive experiments on the MS-COCO dataset. The experimental results show that our method outperforms state-of-the-art captioning methods on various quantitative metrics as well as in human evaluation, which supports the use of our text-conditional attention in image captioning.

Year	DOI	Venue
2017	10.1145/3126686.3126717	MM '17: ACM Multimedia Conference Mountain View California USA October, 2017
Keywords	Field	DocType
image captioning, multi-modal embedding, LSTM, Neural Network	Closed captioning,Computer science,Image retrieval,Network architecture,Artificial intelligence,Language model,Architecture,Embedding,Automatic image annotation,Information retrieval,Feature (computer vision),Speech recognition,Machine learning	Conference
ISBN	Citations	PageRank
978-1-4503-5416-5	12	0.60
References	Authors
34	4

Authors (4 rows)

Cited by (12 rows)

References (34 rows)

Name	Order	Citations	PageRank
luowei zhou	1	54	6.95
Chenliang Xu	2	434	28.73
Parker Koch	3	12	0.60
Jason J. Corso	4	37	3.84

1