Abstract | ||
---|---|---|
This paper presents our methods to the Audio-Video Based Emotion Recognition subtask in the 2017 Emotion Recognition in the Wild (EmotiW) Challenge. The task aims to predict one of the seven basic emotions for short video segments. We extract different features from audio and facial expression modalities. We also explore the temporal LSTM model with the input of frame facial features, which improves the performance of the non-temporal model. The fusion of different modality features and the temporal model lead us to achieve a 58.5% accuracy on the testing set, which shows the effectiveness of our methods.
|
Year | Venue | Field |
---|---|---|
2017 | ICMI | Modalities,Emotion recognition,Computer science,Emotion classification,Temporal models,Speech recognition,Human–computer interaction,Facial expression |
DocType | ISBN | Citations |
Conference | 978-1-4503-5543-8 | 0 |
PageRank | References | Authors |
0.34 | 16 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shuai Wang | 1 | 252 | 48.81 |
Wenxuan Wang | 2 | 4 | 5.52 |
Jinming Zhao | 3 | 5 | 2.86 |
Shizhe Chen | 4 | 238 | 21.83 |
Qin Jin | 5 | 639 | 66.86 |
Shilei Zhang | 6 | 57 | 9.81 |
Yong Qin | 7 | 161 | 42.54 |