Title
Empirical evaluation of emotion classification accuracy for non-acted speech
Abstract
Emotion recognition is important at the workplace because it impacts a multitude of outcomes, such as performance, engagement and well-being. Emotion recognition from audio is an attractive option due to its non-obtrusive nature and availability of microphones in devices at the workplace. We describe building a classifier that analyzes the para-linguistic features of audio streams to classify them into positive, neutral and negative affect. Since speech at the workplace is different from acted speech, and because it is important that the training data be situated in the right context, we designed and executed an emotion induction procedure to generate a corpus of non-acted speech data of 33 speakers. The corpus was used to train a set of classification models and a comparative analysis of these models was used to choose the feature parameters. Bootstrap aggregation (bagging) was then used on the best combination of algorithm (Random Forest) and features (60 millisecond window size). The resulting classification accuracy of 73% is on par with, or exceeds, accuracies reported in the current literature for non-acted speech for a speaker-dependent test set. For reference, we also report the speaker-dependent recognition accuracy (95%) of the same classifier trained and tested on acted speech for three emotions in the Emo-DB database.
Year
DOI
Venue
2017
10.1109/MMSP.2017.8122261
2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)
Keywords
Field
DocType
classification models,feature parameters,speaker-dependent test set,speaker-dependent recognition accuracy,classifier,acted speech,empirical evaluation,emotion classification accuracy,nonacted speech Emotion recognition,attractive option,para-linguistic features,audio streams,training data,emotion induction procedure,nonacted speech data,Emo-DB database,Bootstrap aggregation,classification accuracy,time 60.0 ms
Pattern recognition,Emotion recognition,Computer science,Emotion classification,Feature extraction,Speech recognition,Bootstrap aggregating,Artificial intelligence,Affect (psychology),Classifier (linguistics),Random forest,Test set
Conference
ISSN
ISBN
Citations 
2163-3517
978-1-5090-3650-9
1
PageRank 
References 
Authors
0.34
17
5