Title | ||
---|---|---|
An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets. |
Abstract | ||
---|---|---|
This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets, always choosing the simplest learning methods: i) transfer learning and low-dimensional space embedding allows to reduce the dimensionality of the representations, ii) visual temporal information handled by a simple score-per-frame selection process averaged across time, iii) simple frame selection mechanism for weighting images within sequences, iv) fusion of the different modalities at prediction level (late fusion). The paper also highlights the inherent challenges of the AFEW dataset and the difficulty of model selection with as few as 383 validation sequences. The proposed real-time emotion classifier achieved a state-of-the-art accuracy of 60.64 % on the test set of AFEW, and ranked 4th at the Emotion in the Wild 2018 challenge.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3242969.3264980 | ICMI |
Keywords | DocType | Volume |
Emotion Recognition, Deep Learning | Conference | abs/1808.02668 |
ISSN | ISBN | Citations |
ICMI (EmotiW) 2018, Oct 2018, Boulder, Colorado, United States | 978-1-4503-5692-3 | 1 |
PageRank | References | Authors |
0.36 | 22 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Valentin Vielzeuf | 1 | 18 | 2.69 |
Corentin Kervadec | 2 | 1 | 0.36 |
stephane pateux | 3 | 24 | 2.06 |
Alexis Lechervy | 4 | 6 | 3.52 |
Frédéric Jurie | 5 | 3924 | 235.82 |