Time-Frequency Attention for Speech Emotion Recognition with Squeeze-and-Excitation Blocks - Citegraph

Paper Info

Title
Time-Frequency Attention for Speech Emotion Recognition with Squeeze-and-Excitation Blocks

Abstract
In the field of Human-Computer Interaction (HCI), Speech Emotion Recognition (SER) is not only a fundamental step towards intelligent interaction but also plays an important role in smart environments e.g., elderly home monitoring. Most deep learning based SER systems invariably focus on handling high-level emotion-relevant features, which means the low-level feature difference between time and frequency dimensions is rarely analyzed. And it leads to an unsatisfactory accuracy in speech emotion recognition. In this paper, we propose the Time-Frequency Attention (TFA) to mine the significant low-level emotion feature from the time domain and the frequency domain. To make full use of the global information after feature fusion conducted by the TFA, we utilize Squeeze-and-Excitation (SE) blocks to compare emotion features from different channels. Experiments are conducted on a benchmark database - Interactive Emotional Dyadic Motion Capture (IEMOCAP). The results indicate that proposed model outperforms the sate-of-the-art methods with the absolute increase of 1.7% and 3.2% on average class accuracy among four emotion classes and weighted accuracy respectively.

Year	DOI	Venue
2022	10.1007/978-3-030-98358-1_42	MULTIMEDIA MODELING (MMM 2022), PT I
Keywords	DocType	Volume
Speech Emotion Recognition, Convolutional Neural Network, Time-Frequency Attention, Low-Level Emotion Feature	Conference	13141
ISSN	Citations	PageRank
0302-9743	1	0.37
References	Authors
0	4

Authors (4 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ke Liu	1	1	0.37
Chen Wang	2	3	9.53
Jiayue Chen	3	1	0.37
Jun Feng	4	1	2.74

1