Title
Integration Of Textual Cues For Fine-Grained Image Captioning Using Deep Cnn And Lstm
Abstract
The automatic narration of a natural scene is an important trait in artificial intelligence that unites computer vision and natural language processing. Caption generation is a challenging task in scene understanding. Most of the state-of-the-art methods are using deep convolutional neural network models to extract visual features of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks for image captioning. However, in such models, only visual features are exploited for caption generation. This work investigated that fusion of text available in an image can give more fined-grained captioning of a scene. In this paper, we have proposed a model which incorporates a deep convolutional neural network and long short-term memory to boost the accuracy of image captioning by fusing text feature available in an image with the visual features extracted in state-of-the-art methods. We have validated the effectiveness of the proposed model on the benchmark datasets (Flickr8k and Flickr30k). The experimental outcomes illustrate that the proposed model outperformed the state-of-the-art methods for image captioning.
Year
DOI
Venue
2020
10.1007/s00521-019-04515-z
NEURAL COMPUTING & APPLICATIONS
Keywords
DocType
Volume
Text saliency, Image captioning, Convolution neural network, Long short-term memory
Journal
32
Issue
ISSN
Citations 
24
0941-0643
2
PageRank 
References 
Authors
0.38
0
2
Name
Order
Citations
PageRank
Neeraj Gupta151.77
Anand Singh Jalal213828.45