Abstract | ||
---|---|---|
Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an 'average voice model' plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on 'non-TTS' corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues. |
Year | Venue | Keywords |
---|---|---|
2009 | INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 | speech synthesis, HMMs, speaker adaptation |
Field | DocType | Citations |
Resource management,Speech synthesis,Computer science,Speech recognition,Hidden Markov model,Perception | Conference | 11 |
PageRank | References | Authors |
0.78 | 3 | 12 |
Name | Order | Citations | PageRank |
---|---|---|---|
junichi yamagishi | 1 | 1906 | 145.51 |
Bela Usabaev | 2 | 45 | 3.30 |
Simon King | 3 | 1438 | 114.49 |
Oliver Watts | 4 | 1022 | 176.11 |
John Dines | 5 | 11 | 0.78 |
Jilei Tian | 6 | 584 | 35.38 |
Rile Hu | 7 | 59 | 5.06 |
Yong Guan | 8 | 11 | 0.78 |
Keiichiro Oura | 9 | 227 | 21.43 |
Keiichi Tokuda | 10 | 3016 | 250.00 |
Reima Karhila | 11 | 94 | 9.68 |
Mikko Kurimo | 12 | 908 | 93.37 |