Title
Understanding auditory representations of emotional expressions with neural networks
Abstract
In contrast to many established emotion recognition systems, convolutional neural networks do not rely on handcrafted features to categorize emotions. Although achieving state-of-the-art performances, it is still not fully understood what these networks learn and how the learned representations correlate with the emotional characteristics of speech. The aim of this work is to contribute to a deeper understanding of the acoustic and prosodic features that are relevant for the perception of emotional states. Firstly, an artificial deep neural network architecture is proposed that learns the auditory features directly from the raw and unprocessed speech signal. Secondly, we introduce two novel methods for the analysis of the implicitly learned representations based on data-driven and network-driven visualization techniques. Using these methods, we identify how the network categorizes an audio signal as a two-dimensional representation of emotions, namely valence and arousal. The proposed approach is a general method to enable a deeper analysis and understanding of the most relevant representations to perceive emotional expressions in speech.
Year
DOI
Venue
2020
10.1007/s00521-018-3869-3
Neural Computing and Applications
Keywords
Field
DocType
Auditory emotion categorization, Affect analysis, Dimensional emotions, Deep neural network
Arousal,Audio signal,Categorization,Convolutional neural network,Speech recognition,Emotional expression,Artificial intelligence,Artificial neural network,Perception,Mathematics,Machine learning,Creative visualization
Journal
Volume
Issue
ISSN
32
4
1433-3058
Citations 
PageRank 
References 
0
0.34
35
Authors
4
Name
Order
Citations
PageRank
Iris Wieser100.34
Pablo V. A. Barros211922.02
Stefan Heinrich3285.50
Stefan Wermter41100151.62