Abstract | ||
---|---|---|
Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/DICTA.2018.8615810 | 2018 Digital Image Computing: Techniques and Applications (DICTA) |
Keywords | Field | DocType |
Image Caption,Microsoft Common Objects in Context (MSCOCO),Convolutional Neural Network,Recurrent Neural Network | Closed captioning,Pattern recognition,Computer science,Convolutional neural network,As is,Recurrent neural network,Artificial intelligence | Conference |
ISBN | Citations | PageRank |
978-1-5386-6603-6 | 0 | 0.34 |
References | Authors | |
4 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mirza Muhammad Ali Baig | 1 | 0 | 0.34 |
Mian Ihtisham Shah | 2 | 0 | 0.34 |
Muhammad Abdullah Wajahat | 3 | 0 | 0.34 |
Nauman Zafar | 4 | 1 | 1.02 |
Omar Arif | 5 | 22 | 5.87 |