Title | ||
---|---|---|
Efficient neural speech synthesis for low-resource languages through multilingual modeling |
Abstract | ||
---|---|---|
Recent advances in neural TTS have led to models that can produce high-quality synthetic speech. However, these models typically require large amounts of training data, which can make it costly to produce a new voice with the desired quality. Although multi-speaker modeling can reduce the data requirements necessary for a new voice, this approach is usually not viable for many low-resource languages for which abundant multi-speaker data is not available. In this paper, we therefore investigated to what extent multilingual multi-speaker modeling can be an alternative to monolingual multi-speaker modeling, and explored how data from foreign languages may best be combined with low-resource language data. We found that multilingual modeling can increase the naturalness of low-resource language speech, showed that multilingual models can produce speech with a naturalness comparable to monolingual multi-speaker models, and saw that the target language naturalness was affected by the strategy used to add foreign language data. |
Year | DOI | Venue |
---|---|---|
2020 | 10.21437/Interspeech.2020-2664 | INTERSPEECH |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marcel de Korte | 1 | 0 | 1.01 |
Jae-Bok Kim | 2 | 30 | 4.43 |
Esther Klabbers | 3 | 161 | 26.76 |