A Hierarchical Multimodal Attention-based Neural Network for Image Captioning - Citegraph

Paper Info

Title
A Hierarchical Multimodal Attention-based Neural Network for Image Captioning

Abstract
A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an \"end-to-end\" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.

Year	DOI	Venue
2017	10.1145/3077136.3080671	SIGIR
Keywords	Field	DocType
Image Captioning, Multimodal Attention, Hierarchical Recurrent Neural Network, Long-Short Term Memory Model	ENCODE,Closed captioning,Convolutional neural network,Computer science,Recurrent neural network,Speech recognition,Time delay neural network,Artificial neural network	Conference
ISBN	Citations	PageRank
978-1-4503-5022-8	5	0.38
References	Authors
6	6

Authors (6 rows)

Cited by (5 rows)

References (6 rows)

Name	Order	Citations	PageRank
Yong Cheng	1	21	5.17
Huang Fei	2	17	4.28
Lian Zhou	3	34	5.77
Cheng Jin	4	78	14.92
Yuejie Zhang	5	127	25.82
Tao Zhang	6	422	100.57

1