Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method - Citegraph

Paper Info

Title
Fast Concatenative Speech Synthesis Using Pre-Fused Speech Units Based on the Plural Unit Selection and Fusion Method

Abstract
We have previously developed a concatenative speech synthesizer based on the plural speech unit selection and fusion method that can synthesize stable and human-like speech. In this method, plural speech units for each speech segment are selected using a cost function and fused by averaging pitch-cycle waveforms. This method has a large computational cost, but some platforms require a speech synthesis system that can work within limited hardware resources. In this paper, we propose an offline unit fusion method that reduces the computational cost. In the proposed method, speech units are fused in advance to make a pre-fused speech unit database. At synthesis time, a speech unit for each segment is selected from the pre-fused speech unit database and the speech waveform is synthesized by applying prosodic modification and concatenation without the computationally expensive unit fusion process. We compared several algorithms for constructing the pre-fused speech unit database. From the subjective and objective evaluations, the effectiveness of the proposed method is confirmed by the results that the quality of synthetic speech of the offline unit fusion method with 100 MB database is close to that of the online unit fusion method with 93 MB JP database and is slightly lower to that of the 390 MB US database, while the computational time is reduced by 80%. We also show that the frequency-weighted VQ-based method is effective for construction of the pre-fused speech unit database.

Year	DOI	Venue
2007	10.1093/ietisy/e90-d.2.544	IEICE Transactions
Keywords	Field	DocType
speech synthesis system,human-like speech,speech segment,pre-fused speech units,offline unit fusion method,concatenative speech synthesizer,pre-fused speech unit database,fusion method,speech unit,fast concatenative speech,plural unit selection,plural speech unit selection,plural speech unit,prosody,cost function,algorithm,concatenation,speech synthesis,computational complexity,sound quality,database	Prosody,Speech synthesis,Speech coding,Pattern recognition,Voice activity detection,Computer science,Waveform,Speech recognition,Sound quality,Artificial intelligence,Concatenation,Computational complexity theory	Journal
Volume	Issue	ISSN
E90-D	2	1745-1361
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Masatsune Tamura	1	107	15.26
Tatsuya Mizutani	2	13	2.79
Takehiko Kagoshima	3	42	8.66

1