Title | ||
---|---|---|
Time-Frequency Attention for Speech Emotion Recognition with Squeeze-and-Excitation Blocks |
Abstract | ||
---|---|---|
In the field of Human-Computer Interaction (HCI), Speech Emotion Recognition (SER) is not only a fundamental step towards intelligent interaction but also plays an important role in smart environments e.g., elderly home monitoring. Most deep learning based SER systems invariably focus on handling high-level emotion-relevant features, which means the low-level feature difference between time and frequency dimensions is rarely analyzed. And it leads to an unsatisfactory accuracy in speech emotion recognition. In this paper, we propose the Time-Frequency Attention (TFA) to mine the significant low-level emotion feature from the time domain and the frequency domain. To make full use of the global information after feature fusion conducted by the TFA, we utilize Squeeze-and-Excitation (SE) blocks to compare emotion features from different channels. Experiments are conducted on a benchmark database - Interactive Emotional Dyadic Motion Capture (IEMOCAP). The results indicate that proposed model outperforms the sate-of-the-art methods with the absolute increase of 1.7% and 3.2% on average class accuracy among four emotion classes and weighted accuracy respectively. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-030-98358-1_42 | MULTIMEDIA MODELING (MMM 2022), PT I |
Keywords | DocType | Volume |
Speech Emotion Recognition, Convolutional Neural Network, Time-Frequency Attention, Low-Level Emotion Feature | Conference | 13141 |
ISSN | Citations | PageRank |
0302-9743 | 1 | 0.37 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ke Liu | 1 | 1 | 0.37 |
Chen Wang | 2 | 3 | 9.53 |
Jiayue Chen | 3 | 1 | 0.37 |
Jun Feng | 4 | 1 | 2.74 |