Abstract | ||
---|---|---|
Deep architectures using identity skip-connections have demonstrated groundbreaking performance in the field of image classification. Recently, empirical studies suggested that identity skip-connections enable ensemble-like behaviour of shallow networks, and that depth is not a solo ingredient for their success. Therefore, we examine the potential of identity skip-connections for the task of Speech Emotion Recognition (SER) where moderately deep temporal architectures are often employed. To this end, we propose a novel architecture which regulates unimpeded feature flows and captures long-term dependencies via gate-based skip-connections and a memory mechanism. Our proposed architecture is compared to other state-of-the-art methods of SER and is evaluated on large aggregated corpora recorded in different contexts. Our proposed architecture outperforms the state-of-the-art methods by 9 - 15% and achieves an Unweighted Accuracy of 80.5% in an imbalanced class distribution. In addition, we examine a variant adopting simplified skip-connections of Residual Networks (ResNet) and show that gate-based skip-connections are more effective than simplified skip-connections.
|
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3123266.3123353 | MM '17: ACM Multimedia Conference
Mountain View
California
USA
October, 2017 |
Keywords | Field | DocType |
deep learning, speech emotion recognition, residual network, high-way network | Residual,Computer vision,Architecture,Computer science,Emotion recognition,Speech recognition,Temporal models,Artificial intelligence,Deep learning,Contextual image classification,Residual neural network,Empirical research | Conference |
ISBN | Citations | PageRank |
978-1-4503-4906-2 | 2 | 0.36 |
References | Authors | |
20 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jae-Bok Kim | 1 | 30 | 4.43 |
Gwenn Englebienne | 2 | 846 | 45.79 |
Khiet P. Truong | 3 | 302 | 32.64 |
Vanessa Evers | 4 | 836 | 80.72 |