Multi-modal Fusion for Video Sentiment Analysis - Citegraph

Paper Info

Title
Multi-modal Fusion for Video Sentiment Analysis

Abstract
ABSTRACTAutomatic sentiment analysis can support revealing a subject's emotional state and opinion tendency toward an entity. In this paper, we present our solutions for the MuSe-Wild sub-challenge of Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020. The videos in this challenge are collected from YouTube about emotional car reviews. In the scenarios, the speaker's sentiment can be conveyed in different modalities including acoustic, visual, and textual modalities. Due to the complementarity of different modalities, the fusion of the multiple modalities has a large impact on sentiment analysis. In this paper, we highlight two aspects of our solutions: 1) we explore various low-level and high-level features from different modalities for emotional state recognition, such as expert-defined low-level descriptors (LLD) and deep learned features, etc. 2) we propose several effective multi-modal fusion strategies to make full use of the different modalities. Our solutions achieve the best CCC performance of 0.4346 and 0.4513 on arousal and valence respectively on the challenge testing set, which significantly outperforms the baseline system with corresponding CCC of 0.2843 and 0.2413 on arousal and valence. The experimental results show that our proposed various effective representations of different modalities and fusion strategies have a strong generalization ability and can bring more robust performance.

Year	DOI	Venue
2020	10.1145/3423327.3423671	MM
DocType	Citations	PageRank
Conference	1	0.36
References	Authors
0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ruichen Li	1	3	2.08
Jinming Zhao	2	27	2.85
Jingwen Hu	3	49	2.78
Shuai Guo	4	4	1.48
Qin Jin	5	639	66.86

1