Abstract | ||
---|---|---|
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)-based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/ICASSP40776.2020.9053214 | ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Keywords | DocType | ISSN |
Speech enhancement,auxiliary information,multi-task learning,and multi-head self-attention | Conference | 1520-6149 |
ISBN | Citations | PageRank |
978-1-5090-6632-2 | 4 | 0.42 |
References | Authors | |
8 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Koizumi Yuma | 1 | 41 | 11.75 |
Kohei Yatabe | 2 | 16 | 10.36 |
Marc Delcroix | 3 | 699 | 62.07 |
Yoshiki Masuyama | 4 | 11 | 5.66 |
Daiki Takeuchi | 5 | 5 | 3.43 |