Abstract | ||
---|---|---|
This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state sized segments that result as its basic synthesis units. It determines which segments to concatenate to produce a target sentence using decision trees built from the training data and a dynamic programming search to optimise a perceptually motivated cost function. The synthesiser can operate both in general domain Text-to-Speech mode, and in Phrase Splicing mode to provide higher quality synthesis in limited domains. Systems have been built in at least 10 different languages and over 70 voices. |
Year | Venue | Field |
---|---|---|
2001 | SSW | Decision tree,Speech synthesis,IBM,Phonetic transcription,Computer science,Phrase,Speech recognition,Concatenation,Hidden Markov model,Sentence |
DocType | Citations | PageRank |
Conference | 11 | 1.61 |
References | Authors | |
5 | 19 |
Name | Order | Citations | PageRank |
---|---|---|---|
Robert E. Donovan | 1 | 79 | 17.28 |
Abraham Ittycheriah | 2 | 534 | 61.23 |
Martin Franz | 3 | 483 | 53.56 |
Bhuvana Ramabhadran | 4 | 1779 | 153.83 |
Ellen Eide | 5 | 96 | 19.16 |
Mahesh Viswanathan | 6 | 2264 | 206.47 |
Raimo Bakis | 7 | 153 | 308.32 |
wael hamza | 8 | 198 | 15.84 |
Michael Picheny | 9 | 1461 | 920.15 |
P. Gleason | 10 | 11 | 1.61 |
T. Rutherfoord | 11 | 11 | 1.61 |
P. Cox | 12 | 11 | 1.61 |
D. Green | 13 | 11 | 2.28 |
Eric Janke | 14 | 48 | 9.98 |
S. Revelin | 15 | 11 | 1.95 |
Claire Waast-Richard | 16 | 56 | 6.58 |
B. Zeller | 17 | 11 | 1.61 |
C. Guenther | 18 | 11 | 1.61 |
J. Kunzmann | 19 | 11 | 1.61 |