Title
Minimum Trajectory Error Training For Deep Neural Networks, Combined With Stacked Bottleneck Features
Abstract
Recently, Deep Neural Networks (DNNs) have shown promise as an acoustic model for statistical parametric speech synthesis. Their ability to learn complex mappings from linguistic features to acoustic features has advanced the naturalness of synthesis speech significantly. However, because DNN parameter estimation methods typically attempt to minimise the mean squared error of each individual frame in the training data, the dynamic and continuous nature of speech parameters is neglected. In this paper, we propose a training criterion that minimises speech parameter trajectory errors, and so takes dynamic constraints from a wide acoustic context into account during training. We combine this novel training criterion with our previously proposed stacked bottleneck features, which provide wide linguistic context. Both objective and subjective evaluation results confirm the effectiveness of the proposed training criterion for improving model accuracy and naturalness of synthesised speech.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
Speech synthesis, acoustic model, deep neural network, trajectory error
Field
DocType
Citations 
Bottleneck,Speech synthesis,Pattern recognition,Computer science,Naturalness,Mean squared error,Speech recognition,Parametric statistics,Artificial intelligence,Estimation theory,Trajectory,Acoustic model
Conference
3
PageRank 
References 
Authors
0.38
15
2
Name
Order
Citations
PageRank
Zhizheng Wu156535.23
Simon King2195.11