Bias And Statistical Significance In Evaluating Speech Synthesis With Mean Opinion Scores - Citegraph

Paper Info

Title
Bias And Statistical Significance In Evaluating Speech Synthesis With Mean Opinion Scores

Abstract
Listening tests and Mean Opinion Scores (MOS) are the most commonly used techniques for the evaluation of speech synthesis quality and naturalness. These arc invaluable in the assessment of subjective qualities of machine generated stimuli. However. there are a number of challenges in understanding the MOS scores that come out of listening tests.Primarily, we advocate for the use of non-parametric statistical tests in the calculation of statistical significance when comparing listening test results.Additionally, based on the results of 46 legacy listening tests, we measure the impact of two sources of bias. Bias introduced by individual participants and synthesized text can a dramatic impact on observed MOS scores. For example, we find that on average the mean difference between the highest and lowest scoring rater is over 2 MOS points (on a 5 point scale). From this observation, we caution against using any statistical test without adjusting for this bias, and provide specific non-parametric recommendations.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-479	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
speech synthesis, listening tests, mean opinion score	Speech synthesis,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Statistical significance	Conference
ISSN	Citations	PageRank
2308-457X	4	0.39
References	Authors
0	2

Authors (2 rows)

Cited by (4 rows)

References (0 rows)

Name	Order	Citations	PageRank
Andrew Rosenberg	1	12	2.53
Bhuvana Ramabhadran	2	1779	153.83

1