Integration Of Textual Cues For Fine-Grained Image Captioning Using Deep Cnn And Lstm - Citegraph

Paper Info

Title
Integration Of Textual Cues For Fine-Grained Image Captioning Using Deep Cnn And Lstm

Abstract
The automatic narration of a natural scene is an important trait in artificial intelligence that unites computer vision and natural language processing. Caption generation is a challenging task in scene understanding. Most of the state-of-the-art methods are using deep convolutional neural network models to extract visual features of the entire image, based on which the parallel structures between images and sentences are exploited using recurrent neural networks for image captioning. However, in such models, only visual features are exploited for caption generation. This work investigated that fusion of text available in an image can give more fined-grained captioning of a scene. In this paper, we have proposed a model which incorporates a deep convolutional neural network and long short-term memory to boost the accuracy of image captioning by fusing text feature available in an image with the visual features extracted in state-of-the-art methods. We have validated the effectiveness of the proposed model on the benchmark datasets (Flickr8k and Flickr30k). The experimental outcomes illustrate that the proposed model outperformed the state-of-the-art methods for image captioning.

Year	DOI	Venue
2020	10.1007/s00521-019-04515-z	NEURAL COMPUTING & APPLICATIONS
Keywords	DocType	Volume
Text saliency, Image captioning, Convolution neural network, Long short-term memory	Journal	32
Issue	ISSN	Citations
24	0941-0643	2
PageRank	References	Authors
0.38	0	2

Authors (2 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Neeraj Gupta	1	5	1.77
Anand Singh Jalal	2	138	28.45

1