Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification - Citegraph

Paper Info

Title
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning with Spoofing Detection and Spoofing Type Classification

Abstract
Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to en-sure the high performance of the models. In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC). Besides, we use the focal loss to maximize the synergy between SD and STC for MOS pre-diction. Experiments using the MOS evaluation results of the Voice Conversion Challenge 2018 show that proposed MTL with two auxiliary tasks improves MOS prediction. Our proposed model achieves up to 11.6% relative improvement in performance over the baseline model.

Year	DOI	Venue
2021	10.1109/SLT48900.2021.9383533	2021 IEEE Spoken Language Technology Workshop (SLT)
Keywords	DocType	ISSN
Speech synthesis,MOS prediction,multi-task learning,spoofing detection,spoofing type classification	Conference	2639-5479
ISBN	Citations	PageRank
978-1-7281-7067-1	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yeunju Choi	1	2	2.37
Youngmoon Jung	2	3	4.42
Hoi-Rin Kim	3	102	20.64

1