Title | ||
---|---|---|
Deep learning-based speaking rate-dependent hierarchical prosodie model for Mandarin TTS. |
Abstract | ||
---|---|---|
Speaking Rate-dependent Hierarchical Prosodic Model (SR-HPM) is a syllable-based statistical prosodic model and has been successfully served as a prosody generation model in a speaking rate-controlled text-to-speech system for Mandarin, and two Chinese dialects: Taiwan MM and Si-Xian Hakka. Excited by the success of utilizing deep learning (DL) techniques in parametric speech synthesis based on the HMM-based speech synthesis system, this study aims to improve the performance of the SR-HPM in prosody generation by replacing the conventional cascaded statistical sub-models with DL-based models, i.e. the DL-based SR-HPM. Each of the sub-model is first independently realized by a specially designed DL-based model based on its input-output characteristics. Then, all sub-models are cascaded and unified as one deep neural structure with their parameters being obtained by an end-to-end (linguistic feature-to-prosodic acoustic feature) optimization manner. The subjective and objective tests show that the DL-based SR-HPM performs better than the conventional statistical SR-HPM in prosody generation. |
Year | Venue | Field |
---|---|---|
2017 | Asia-Pacific Signal and Information Processing Association Annual Summit and Conference | Prosody,Speech synthesis,Pragmatics,Computer science,Speech recognition,Syllable,Artificial intelligence,Deep learning,Hidden Markov model,Artificial neural network,Mandarin Chinese |
DocType | ISSN | Citations |
Conference | 2309-9402 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yen-Ting Lin | 1 | 5 | 4.11 |
Chen-Yu Chiang | 2 | 31 | 11.55 |