Title | ||
---|---|---|
Minimum Trajectory Error Training For Deep Neural Networks, Combined With Stacked Bottleneck Features |
Abstract | ||
---|---|---|
Recently, Deep Neural Networks (DNNs) have shown promise as an acoustic model for statistical parametric speech synthesis. Their ability to learn complex mappings from linguistic features to acoustic features has advanced the naturalness of synthesis speech significantly. However, because DNN parameter estimation methods typically attempt to minimise the mean squared error of each individual frame in the training data, the dynamic and continuous nature of speech parameters is neglected. In this paper, we propose a training criterion that minimises speech parameter trajectory errors, and so takes dynamic constraints from a wide acoustic context into account during training. We combine this novel training criterion with our previously proposed stacked bottleneck features, which provide wide linguistic context. Both objective and subjective evaluation results confirm the effectiveness of the proposed training criterion for improving model accuracy and naturalness of synthesised speech. |
Year | Venue | Keywords |
---|---|---|
2015 | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | Speech synthesis, acoustic model, deep neural network, trajectory error |
Field | DocType | Citations |
Bottleneck,Speech synthesis,Pattern recognition,Computer science,Naturalness,Mean squared error,Speech recognition,Parametric statistics,Artificial intelligence,Estimation theory,Trajectory,Acoustic model | Conference | 3 |
PageRank | References | Authors |
0.38 | 15 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhizheng Wu | 1 | 565 | 35.23 |
Simon King | 2 | 19 | 5.11 |