Abstract | ||
---|---|---|
Up to now, caption generation is still a hard problem in artificial intelligence where a textual description must be generated for a given image. This problem combines both computer vision and natural language processing. Generally, the CNN - RNN is a popular architecture in image captioning. Currently, there are many variants of this architecture, where the attention mechanism is an important discovery. Recently, deep learning methods have achieved state-of-the-art results for this problem. In this paper, we present a model that generates natural language descriptions of given images. Our approach uses the pre-trained deep neural network models to extract visual features and then applies an LSTM to generate captions. We use BLEU scores to evaluate our model performance on Flickr8k and Flickr30k dataset. In addition, we carried out a comparison between the approaches without attention mechanism and attention-based mechanism.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3310986.3311002 | Proceedings of the 3rd International Conference on Machine Learning and Soft Computing |
Keywords | DocType | ISBN |
CNN, Image captioning, LSTM, RNN, attention mechanism | Conference | 978-1-4503-6612-0 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tien X. Dang | 1 | 0 | 0.34 |
Aran Oh | 2 | 0 | 0.68 |
In Seop Na | 3 | 42 | 13.83 |
Soo Hyung Kim | 4 | 29 | 6.39 |