Title
Statistical Text-To-Speech Synthesis With Improved Dynamics
Abstract
In statistical TTS systems (STTS), speech features dynamics is modeled by first- and second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over smoothing of speech features, often sounding as muffled synthesized speech. To improve feature dynamics a Global Variance approach has been suggested. However, it is computationally complex. We propose a different approach for modeling feature dynamics based on applying the DFT to the whole set of feature frames representing a phoneme. In the transform domain the inter-frame feature dynamics is then expressed in terms of inter-harmonic content, which can be modified to statistically match the dynamics of natural speech. To synthesize a whole utterance we propose a method for smoothly combining the enhanced-dynamics phonemes, which improves synthesized speech quality of STTS with similar complexity to conventional STTS.
Year
Venue
Keywords
2008
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5
text-to-speech synthesis (TTS), statistical speech modeling, speech features dynamics, global variance
Field
DocType
Citations 
Speech quality,Computer science,Utterance,Speech recognition,Smoothing,Text to speech synthesis,Natural language processing,Artificial intelligence
Conference
0
PageRank 
References 
Authors
0.34
4
2
Name
Order
Citations
PageRank
Stas Tiomkin1273.81
David Malah221960.95