Abstract | ||
---|---|---|
Deep neural networks (DNNs) have consistently pushed the state-of-the-art performance in many fields, including speech emotion recognition. However, DNN-based solutions require vast amounts of labeled data for training. In speech emotion recognition, the cost and time needed to annotate data with emotional labels can be prohibitive. The available corpora normally have a few thousand recordings collected by a limited number of speakers. As a result, models trained on such corpora fail to generalize to samples from new domains. This study explores practical solutions to train DNNs for speech emotion recognition with limited resources by using active learning (AL). We assume that data without emotional labels from a new domain are available and we have resources to select a limited number of recordings to be annotated with emotional labels. We actively select samples using greedy sampling (GS) and uncertainty-based methods, evaluating the performance on regression problems where the goal is to predict scores for arousal and valence. We show that the use of active learning leads to competitive performance with limited training data. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ACII.2019.8925524 | 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII) |
Keywords | Field | DocType |
Speech emotion recognition,active learning,multitask autoencoder | Training set,Data modeling,Arousal,Active learning,Communication,Emotion recognition,Computer science,Speech recognition,Labeled data,Regression problems,Artificial neural network | Conference |
ISSN | ISBN | Citations |
2156-8103 | 978-1-7281-3889-3 | 2 |
PageRank | References | Authors |
0.36 | 14 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mohammed Abdel-Wahab | 1 | 49 | 3.35 |
Carlos Busso | 2 | 1616 | 93.04 |