Title
Video Multimodal Emotion Recognition Based On Bi-Gru And Attention Fusion
Abstract
A video multimodal emotion recognition method based on Bi-GRU and attention fusion is proposed in this paper. Bidirectional gated recurrent unit (Bi-GRU) is applied to improve the accuracy of emotion recognition in time contexts. A new network initialization method is proposed and applied to the network model, which can further improve the video emotion recognition accuracy of the time-contextual learning. To overcome the weight consistency of each modality in multimodal fusion, a video multimodal emotion recognition method based on attention fusion network is proposed. The attention fusion network can calculate the attention distribution of each modality at each moment in real-time so that the network model can learn multimodal contextual information in real-time. The experimental results show that the proposed method can improve the accuracy of emotion recognition in three single modalities of textual, visual, and audio, meanwhile improve the accuracy of video multimodal emotion recognition. The proposed method outperforms the existing state-of-the-art methods for multimodal emotion recognition in sentiment classification and sentiment regression.
Year
DOI
Venue
2021
10.1007/s11042-020-10030-4
MULTIMEDIA TOOLS AND APPLICATIONS
Keywords
DocType
Volume
Video emotion recognition, Multimodal, Bi-GRU, Attention mechanism, Fusion
Journal
80
Issue
ISSN
Citations 
6
1380-7501
1
PageRank 
References 
Authors
0.35
0
6
Name
Order
Citations
PageRank
Ruohong Huan1186.30
Jia Shu210.35
Sheng-Lin Bao310.35
Ronghua Liang437642.60
Peng Chen5147.57
Kaikai Chi611420.18