Title
A survey on automatic image caption generation.
Abstract
Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented.
Year
DOI
Venue
2018
10.1016/j.neucom.2018.05.080
Neurocomputing
Keywords
Field
DocType
Image captioning,Sentence template,Deep neural networks,Multimodal embedding,Encoder–decoder framework,Attention mechanism
Subcategory,Closed captioning,Semantic information,Natural language,Natural language processing,Artificial intelligence,Artificial neural network,Machine learning,Mathematics
Journal
Volume
ISSN
Citations 
311
0925-2312
11
PageRank 
References 
Authors
0.55
72
2
Name
Order
Citations
PageRank
Shuang Bai1458.01
Shan An2174.03