Abstract | ||
---|---|---|
This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunci- ation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker's variations. Cre- ating speaker-dependent pronunciation lexicons for auto- matic speech labeling of our TTS voice databases helped to eliminate many pronunciation errors that resulted from mis- matches between lexical pronunciations and how the speaker (voice talent) actually pronounced a word. We also found that it contributed other synthesis quality improvement as well. A perceptual test showed that our work contributed to MOS improvement for American English male and female voices. |
Year | Venue | Keywords |
---|---|---|
2004 | INTERSPEECH | context dependent,quality improvement |
Field | DocType | Citations |
Pronunciation,Computer science,Speech recognition,Lexicon,Artificial intelligence,Natural language processing | Conference | 6 |
PageRank | References | Authors |
0.68 | 2 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yeon-Jun Kim | 1 | 52 | 9.52 |
Ann K. Syrdal | 2 | 244 | 33.00 |
Alistair Conkie | 3 | 264 | 38.03 |