Statistical Text-To-Speech Synthesis With Improved Dynamics - Citegraph

Paper Info

Title
Statistical Text-To-Speech Synthesis With Improved Dynamics

Abstract
In statistical TTS systems (STTS), speech features dynamics is modeled by first- and second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over smoothing of speech features, often sounding as muffled synthesized speech. To improve feature dynamics a Global Variance approach has been suggested. However, it is computationally complex. We propose a different approach for modeling feature dynamics based on applying the DFT to the whole set of feature frames representing a phoneme. In the transform domain the inter-frame feature dynamics is then expressed in terms of inter-harmonic content, which can be modified to statistically match the dynamics of natural speech. To synthesize a whole utterance we propose a method for smoothly combining the enhanced-dynamics phonemes, which improves synthesized speech quality of STTS with similar complexity to conventional STTS.

Year	Venue	Keywords
2008	INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5	text-to-speech synthesis (TTS), statistical speech modeling, speech features dynamics, global variance
Field	DocType	Citations
Speech quality,Computer science,Utterance,Speech recognition,Smoothing,Text to speech synthesis,Natural language processing,Artificial intelligence	Conference	0
PageRank	References	Authors
0.34	4	2

Authors (2 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Stas Tiomkin	1	27	3.81
David Malah	2	219	60.95

1