Title
Multi-modal Fusion for Video Sentiment Analysis
Abstract
ABSTRACTAutomatic sentiment analysis can support revealing a subject's emotional state and opinion tendency toward an entity. In this paper, we present our solutions for the MuSe-Wild sub-challenge of Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020. The videos in this challenge are collected from YouTube about emotional car reviews. In the scenarios, the speaker's sentiment can be conveyed in different modalities including acoustic, visual, and textual modalities. Due to the complementarity of different modalities, the fusion of the multiple modalities has a large impact on sentiment analysis. In this paper, we highlight two aspects of our solutions: 1) we explore various low-level and high-level features from different modalities for emotional state recognition, such as expert-defined low-level descriptors (LLD) and deep learned features, etc. 2) we propose several effective multi-modal fusion strategies to make full use of the different modalities. Our solutions achieve the best CCC performance of 0.4346 and 0.4513 on arousal and valence respectively on the challenge testing set, which significantly outperforms the baseline system with corresponding CCC of 0.2843 and 0.2413 on arousal and valence. The experimental results show that our proposed various effective representations of different modalities and fusion strategies have a strong generalization ability and can bring more robust performance.
Year
DOI
Venue
2020
10.1145/3423327.3423671
MM
DocType
Citations 
PageRank 
Conference
1
0.36
References 
Authors
0
5
Name
Order
Citations
PageRank
Ruichen Li132.08
Jinming Zhao2272.85
Jingwen Hu3492.78
Shuai Guo441.48
Qin Jin563966.86