Title
Rethinking Auditory Affective Descriptors Through Zero-Shot Emotion Recognition in Speech
Abstract
Zero-shot speech emotion recognition (SER) endows machines with the ability of sensing unseen-emotional states in speech, compared with conventional SER endeavors on supervised cases. On addressing the zero-shot SER task, auditory affective descriptors (AADs) are typically employed to transfer affective knowledge from seen- to unseen-emotional states. However, it remains unknown which types of AADs can well describe emotional states in speech during the transfer. In this regard, we define and research on three types of AADs, namely, per-emotion semantic-embedding, per-emotion manually annotated, and per-sample manually annotated AADs, through zero-shot emotion recognition in speech. This leads to a systematic design including prototype- and annotation-based zero-shot SER modules, relying on the input from per-emotion and per-sample AADs, respectively. We then perform extensive experimental comparisons between human and machines’ AADs on the French emotional speech corpus CINEMO for positive-negative (PN) and within-negative (WN) tasks. The experimental results indicate that semantic-embedding prototypes from pretrained models can outperform manually annotated emotional dimensions in zero-shot SER. The results further demonstrate that it is possible for machines to understand and describe affective information in speech better than human beings, with the help of sufficient pretrained models.
Year
DOI
Venue
2022
10.1109/TCSS.2021.3130401
IEEE Transactions on Computational Social Systems
Keywords
DocType
Volume
Auditory affective descriptors (AADs),semantic-embedding prototypes,speech emotion recognition (SER),zero-shot emotion recognition
Journal
9
Issue
ISSN
Citations 
5
2329-924X
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Xin Xu116240.08
Jia Deng210850539.69
Zixing Zhang339731.73
Xin Fan4776104.55
Zhaoxiang Zhang5102299.76
L Devillers600.34
Björn Schuller76749463.50