Modeling of speaking rate influences on mandarin speech prosody and its application to speaking rate-controlled TTS - Citegraph

Paper Info

Title
Modeling of speaking rate influences on mandarin speech prosody and its application to speaking rate-controlled TTS

Abstract
A new data-driven approach to building a speaking rate-dependent hierarchical prosodic model (SR-HPM), directly from a large prosody-unlabeled speech database containing utterances of various speaking rates, to describe the influences of speaking rate on Mandarin speech prosody is proposed. It is an extended version of the existing HPM model which contains 12 sub-models to describe various relationships of prosodic-acoustic features of speech signal, linguistic features of the associated text, and prosodic tags representing the prosodic structure of speech. Two main modifications are suggested. One is designing proper normalization functions from the statistics of the whole database to compensate the influences of speaking rate on all prosodic-acoustic features. Another is modifying the HPM training to let its parameters be speaking-rate dependent. Experimental results on a large Mandarin read speech corpus showed that the parameters of the SR-HPM together with these feature normalization functions interpreted the effects of speaking rate onMandarin speech prosody very well. An application of the SR-HPM to design and implement a speaking rate-controlled Mandarin TTS system is demonstrated. The system can generate natural synthetic speech for any given speaking rate in awide range of 3.4-6.8 syllables/sec. Two subjective tests, MOS and preference test, were conducted to compare the proposed system with the popular HTS system. The MOS scores of the proposed system were in the range of 3.58-3.83 for eight different speaking rates, while they were in 3.09-3.43 for HTS. Besides, the proposed system had higher preference scores (49.8%-79.6%) than those (9.8%-30.7%) of HTS. This confirmed the effectiveness of the speaking rate control method of the proposed TTS system.

Year	DOI	Venue
2014	10.1109/TASLP.2014.2321482	IEEE/ACM Transactions on Audio, Speech & Language Processing
Keywords	Field	DocType
speaking rate control method,speech processing,algorithms,design,prosodic structure,speaking rate-dependent hierarchical prosodic model,prosodic tags,experimentation,model development,natural synthetic speech,statistics,hts system,normalization functions,speaking rate-controlled tts,mandarin speech prosody,speech signal,sound and music computing,speech synthesis,mos scores,measurement,prosodic-acoustic features,mandarin prosody modeling,sr-hpm,utterances,natural language processing,prosody-unlabeled speech database,speaking rate modeling,modeling,speaking rate-controlled mandarin tts system,performance,pragmatics,speech,databases	Speech corpus,Prosody,Speech processing,Normalization (statistics),Pragmatics,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Rate control method,Mandarin Chinese,Speech processing speech synthesis	Journal
Volume	Issue	ISSN
22	7	2329-9290
Citations	PageRank	References
4	0.52	21
Authors
7

Authors (7 rows)

Cited by (4 rows)

References (21 rows)

Name	Order	Citations	PageRank
Sin-Horng Chen	1	273	39.86
Chiao-Hua Hsieh	2	6	0.89
Chen-Yu Chiang	3	31	11.55
Hsi-Chun Hsiao	4	4	0.86
Yih-Ru Wang	5	237	34.68
Yuan-Fu Liao	6	73	20.38
Hsiu-Min Yu	7	11	2.80

1