Title
TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches
Abstract
Human speech is not only a verbose medium of communication but it also conveys emotions. The past decade has seen a lot of research going on with speech data which becomes especially important for human-computer interaction and also healthcare, security, and entertainment. This paper proposes the TLEFuzzyNet model, a three-stage pipeline for emotion recognition from speech. The first stage includes feature extraction by data augmentation of speech signals and extraction of Mel spectrograms, followed by the use of three pretrained transfer learning CNN models namely, ResNet18, Inception_v3, and GoogleNet whose prediction scores are fed to the third stage. In the final stage, we assign Fuzzy Ranks using a modified Gompertz function which gives the final prediction scores after considering the individual scores from the three CNN models. We have used the Surrey Audio-Visual Expressed Emotion (SAVEE), the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and the Berlin Database of Emotional Speech (EmoDB) datasets to evaluate the TLEFuzzyNet model which has achieved state-of-the-art performance and is hence a dependable framework for Speech emotion recognition(SER). All the codes are available using GitHub link: https://github.com/KaramSahoo/SpeechEmotionRecognitionFuzzy
Year
DOI
Venue
2021
10.1109/ACCESS.2021.3135658
IEEE ACCESS
Keywords
DocType
Volume
Databases, Speech recognition, Transfer learning, Feature extraction, Hidden Markov models, Emotion recognition, Spectrogram, TLEFuzzyNet, deep learning, fuzzy rank-based ensemble, speech emotion recognition, spectrogram, transfer learning, Gompertz function
Journal
9
ISSN
Citations 
PageRank 
2169-3536
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Karam Kumar Sahoo100.68
Ishan Dutta200.34
Muhammad Fazal Ijaz3176.13
Marcin Wozniak43613.22
Pawan Kumar Singh500.68