Title
Thousands Of Voices For Hmm-Based Speech Synthesis
Abstract
Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an 'average voice model' plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on 'non-TTS' corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.
Year
Venue
Keywords
2009
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5
speech synthesis, HMMs, speaker adaptation
Field
DocType
Citations 
Resource management,Speech synthesis,Computer science,Speech recognition,Hidden Markov model,Perception
Conference
11
PageRank 
References 
Authors
0.78
3
12
Name
Order
Citations
PageRank
junichi yamagishi11906145.51
Bela Usabaev2453.30
Simon King31438114.49
Oliver Watts41022176.11
John Dines5110.78
Jilei Tian658435.38
Rile Hu7595.06
Yong Guan8110.78
Keiichiro Oura922721.43
Keiichi Tokuda103016250.00
Reima Karhila11949.68
Mikko Kurimo1290893.37