Title
Deep learning-based speaking rate-dependent hierarchical prosodie model for Mandarin TTS.
Abstract
Speaking Rate-dependent Hierarchical Prosodic Model (SR-HPM) is a syllable-based statistical prosodic model and has been successfully served as a prosody generation model in a speaking rate-controlled text-to-speech system for Mandarin, and two Chinese dialects: Taiwan MM and Si-Xian Hakka. Excited by the success of utilizing deep learning (DL) techniques in parametric speech synthesis based on the HMM-based speech synthesis system, this study aims to improve the performance of the SR-HPM in prosody generation by replacing the conventional cascaded statistical sub-models with DL-based models, i.e. the DL-based SR-HPM. Each of the sub-model is first independently realized by a specially designed DL-based model based on its input-output characteristics. Then, all sub-models are cascaded and unified as one deep neural structure with their parameters being obtained by an end-to-end (linguistic feature-to-prosodic acoustic feature) optimization manner. The subjective and objective tests show that the DL-based SR-HPM performs better than the conventional statistical SR-HPM in prosody generation.
Year
Venue
Field
2017
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
Prosody,Speech synthesis,Pragmatics,Computer science,Speech recognition,Syllable,Artificial intelligence,Deep learning,Hidden Markov model,Artificial neural network,Mandarin Chinese
DocType
ISSN
Citations 
Conference
2309-9402
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Yen-Ting Lin154.11
Chen-Yu Chiang23111.55