A speaking rate-controlled Mandarin TTS system - Citegraph

Paper Info

Title
A speaking rate-controlled Mandarin TTS system

Abstract
In this paper, a new speaking rate-controlled Mandarin TTS system based on a speaking rate-dependent hierarchical prosodic model (SR-HPM) [6] is proposed. In the training phase, a data-driven approach is employed to automatically build the SR-HPM directly from a large prosody-unlabeled speech database containing utterances of various speaking rates. The SR-HPM comprises 15 sub-models designed to describe various relationships among 3 types of prosodic-acoustic features of speech utterances, two types of prosodic tags specifying a 4-layer prosody hierarchy, linguistic features of various levels of the associated texts, and the speaking rates. In the test phase, the SR-HPM is employed to generate 4 prosodic-acoustic features, including syllable pitch contours, syllable durations, syllable energy levels, and syllable juncture pause durations. Combining these prosodic features with the spectral features generated by the HTS synthesizer, the system can generate natural speech for any speaking rate in a wide range of 0.15-0.3 seconds/syllable. A distinct feature of the system to control the occurrence frequencies of breaks of various types as well as their pause durations according to the given speaking rate was demonstrated. A subjective test showed that MOS scores of 3.35, 3.44 and 3.28 were achieved respectively for fast (SR=0.17 sec/syllable), medium (SR=0.2 sec/syllable) and slow (SR=0.25 sec/syllable) synthetic speeches.

Year	DOI	Venue
2013	10.1109/ICASSP.2013.6638999	ICASSP
Keywords	Field	DocType
speaking rate modeling,hts synthesizer,syllable pitch contours,speaking rate-dependent hierarchical prosodic model,prosodic tags,sr-hpm,speaking rate-controlled tts,data-driven approach,speech synthesis,syllable energy levels,speech utterances,syllable juncture pause durations,4-layer prosody hierarchy,prosodic-acoustic features,speaking rate-controlled mandarin tts system,linguistic features,text-to-speech synthesis,natural language processing,prosody-unlabeled speech database,training phase,mandarin prosody modeling,syllable durations,hidden markov models,high temperature superconductors,speech,pragmatics,energy states,databases	Prosody,Juncture,Speech synthesis,Computer science,Speech recognition,Syllable,Hierarchy,Mandarin Chinese	Conference
ISSN	Citations	PageRank
1520-6149	2	0.37
References	Authors
0	4

Authors (4 rows)

Cited by (2 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chiao-Hua Hsieh	1	6	0.89
Yih-Ru Wang	2	237	34.68
Chen-Yu Chiang	3	31	11.55
Sin-Horng Chen	4	273	39.86

1