Title
Mood-aware visual question answering.
Abstract
The concept of Visual Question Answering (VQA) has recently attracted the attention of many researchers in the field of machine learning. Different attention models have been proposed in VQA for the purpose of addressing the need to focus on local regions of an image. This paper proposes the concept of Mood-Aware Visual Question Answering (MAVQA) using novel long short term memory (LSTM) and convolutional neural network (CNN) attention models that combine the local image features, the question and the mood detected from the particular region of the image to produce a mood-based answer using a pre-processed image dataset. The attention mechanisms serve to enable the VQA model to only focus on parts of the image that are relevant to both the detected mood and the key words in the question. The irrelevant parts of the image are ignored, thus improving classification accuracy by reducing the chances of predicting wrong answers. Whereas previous efforts have utilized CNN mostly for the embedding of images and text, we formulate a CNN attention algorithm for the image, question and mood. The more direct convolutional attention operation is more efficient and effective, when the number of views and kernel length are optimized, than the winding recurrent LSTM attention operation. The experimental results prove that MAVQA is effectively mood-aware, and the accuracy levels of our LSTM attention model are well within the range of previous conventional VQA benchmarks, while our novel CNN attention model outperforms the previous baselines in several instances. The additional attention on the mood does not only improve classification accuracy but also substantially contributes towards the analysis and comprehension of image features, a key development in modern artificial intelligence.
Year
DOI
Venue
2019
10.1016/j.neucom.2018.11.049
Neurocomputing
Keywords
Field
DocType
Mood-aware,Visual question answering,Attention model,Long short term memory,Convolutional neural network
Kernel (linear algebra),Mood,Question answering,Embedding,Convolutional neural network,Feature (computer vision),Attention model,Artificial intelligence,Comprehension,Machine learning,Mathematics
Journal
Volume
ISSN
Citations 
330
0925-2312
2
PageRank 
References 
Authors
0.35
42
5
Name
Order
Citations
PageRank
Nelson Ruwa141.73
Qirong Mao226134.29
Liangjun Wang371.79
Jianping Gou411624.01
Ming Dong510513.70