Title
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification
Abstract
Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to en-sure the high performance of the models. In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC). Besides, we use the focal loss to maximize the synergy between SD and STC for MOS pre-diction. Experiments using the MOS evaluation results of the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction. Our proposed model achieves up to 11.6% relative improvement in performance over the baseline model.
Year
DOI
Venue
2021
10.1109/SLT48900.2021.9383533
2021 IEEE Spoken Language Technology Workshop (SLT)
Keywords
DocType
ISSN
Speech synthesis,MOS prediction,multi-task learning,spoofing detection,spoofing type classification
Conference
2639-5479
ISBN
Citations 
PageRank 
978-1-7281-7067-1
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Yeunju Choi122.37
Youngmoon Jung234.42
Hoi-Rin Kim310220.64