Title
A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
Abstract
A novel hierarchical multimodal attention-based model is developed in this paper to generate more accurate and descriptive captions for images. Our model is an \"end-to-end\" neural network which contains three related sub-networks: a deep convolutional neural network to encode image contents, a recurrent neural network to identify the objects in images sequentially, and a multimodal attention-based recurrent neural network to generate image captions. The main contribution of our work is that the hierarchical structure and multimodal attention mechanism is both applied, thus each caption word can be generated with the multimodal attention on the intermediate semantic objects and the global visual content. Our experiments on two benchmark datasets have obtained very positive results.
Year
DOI
Venue
2017
10.1145/3077136.3080671
SIGIR
Keywords
Field
DocType
Image Captioning, Multimodal Attention, Hierarchical Recurrent Neural Network, Long-Short Term Memory Model
ENCODE,Closed captioning,Convolutional neural network,Computer science,Recurrent neural network,Speech recognition,Time delay neural network,Artificial neural network
Conference
ISBN
Citations 
PageRank 
978-1-4503-5022-8
5
0.38
References 
Authors
6
6
Name
Order
Citations
PageRank
Yong Cheng1215.17
Huang Fei2174.28
Lian Zhou3345.77
Cheng Jin47814.92
Yuejie Zhang512725.82
Tao Zhang6422100.57