Title
Time-Frequency Attention for Speech Emotion Recognition with Squeeze-and-Excitation Blocks
Abstract
In the field of Human-Computer Interaction (HCI), Speech Emotion Recognition (SER) is not only a fundamental step towards intelligent interaction but also plays an important role in smart environments e.g., elderly home monitoring. Most deep learning based SER systems invariably focus on handling high-level emotion-relevant features, which means the low-level feature difference between time and frequency dimensions is rarely analyzed. And it leads to an unsatisfactory accuracy in speech emotion recognition. In this paper, we propose the Time-Frequency Attention (TFA) to mine the significant low-level emotion feature from the time domain and the frequency domain. To make full use of the global information after feature fusion conducted by the TFA, we utilize Squeeze-and-Excitation (SE) blocks to compare emotion features from different channels. Experiments are conducted on a benchmark database - Interactive Emotional Dyadic Motion Capture (IEMOCAP). The results indicate that proposed model outperforms the sate-of-the-art methods with the absolute increase of 1.7% and 3.2% on average class accuracy among four emotion classes and weighted accuracy respectively.
Year
DOI
Venue
2022
10.1007/978-3-030-98358-1_42
MULTIMEDIA MODELING (MMM 2022), PT I
Keywords
DocType
Volume
Speech Emotion Recognition, Convolutional Neural Network, Time-Frequency Attention, Low-Level Emotion Feature
Conference
13141
ISSN
Citations 
PageRank 
0302-9743
1
0.37
References 
Authors
0
4
Name
Order
Citations
PageRank
Ke Liu110.37
Chen Wang239.53
Jiayue Chen310.37
Jun Feng412.74