Title | ||
---|---|---|
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification |
Abstract | ||
---|---|---|
Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to en-sure the high performance of the models. In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC). Besides, we use the focal loss to maximize the synergy between SD and STC for MOS pre-diction. Experiments using the MOS evaluation results of the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction. Our proposed model achieves up to 11.6% relative improvement in performance over the baseline model. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/SLT48900.2021.9383533 | 2021 IEEE Spoken Language Technology Workshop (SLT) |
Keywords | DocType | ISSN |
Speech synthesis,MOS prediction,multi-task learning,spoofing detection,spoofing type classification | Conference | 2639-5479 |
ISBN | Citations | PageRank |
978-1-7281-7067-1 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yeunju Choi | 1 | 2 | 2.37 |
Youngmoon Jung | 2 | 3 | 4.42 |
Hoi-Rin Kim | 3 | 102 | 20.64 |