Deep learning-based speaking rate-dependent hierarchical prosodie model for Mandarin TTS. - Citegraph

Paper Info

Title
Deep learning-based speaking rate-dependent hierarchical prosodie model for Mandarin TTS.

Abstract
Speaking Rate-dependent Hierarchical Prosodic Model (SR-HPM) is a syllable-based statistical prosodic model and has been successfully served as a prosody generation model in a speaking rate-controlled text-to-speech system for Mandarin, and two Chinese dialects: Taiwan MM and Si-Xian Hakka. Excited by the success of utilizing deep learning (DL) techniques in parametric speech synthesis based on the HMM-based speech synthesis system, this study aims to improve the performance of the SR-HPM in prosody generation by replacing the conventional cascaded statistical sub-models with DL-based models, i.e. the DL-based SR-HPM. Each of the sub-model is first independently realized by a specially designed DL-based model based on its input-output characteristics. Then, all sub-models are cascaded and unified as one deep neural structure with their parameters being obtained by an end-to-end (linguistic feature-to-prosodic acoustic feature) optimization manner. The subjective and objective tests show that the DL-based SR-HPM performs better than the conventional statistical SR-HPM in prosody generation.

Year	Venue	Field
2017	Asia-Pacific Signal and Information Processing Association Annual Summit and Conference	Prosody,Speech synthesis,Pragmatics,Computer science,Speech recognition,Syllable,Artificial intelligence,Deep learning,Hidden Markov model,Artificial neural network,Mandarin Chinese
DocType	ISSN	Citations
Conference	2309-9402	0
PageRank	References	Authors
0.34	0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yen-Ting Lin	1	5	4.11
Chen-Yu Chiang	2	31	11.55

1