Minimum Trajectory Error Training For Deep Neural Networks, Combined With Stacked Bottleneck Features - Citegraph

Paper Info

Title
Minimum Trajectory Error Training For Deep Neural Networks, Combined With Stacked Bottleneck Features

Abstract
Recently, Deep Neural Networks (DNNs) have shown promise as an acoustic model for statistical parametric speech synthesis. Their ability to learn complex mappings from linguistic features to acoustic features has advanced the naturalness of synthesis speech significantly. However, because DNN parameter estimation methods typically attempt to minimise the mean squared error of each individual frame in the training data, the dynamic and continuous nature of speech parameters is neglected. In this paper, we propose a training criterion that minimises speech parameter trajectory errors, and so takes dynamic constraints from a wide acoustic context into account during training. We combine this novel training criterion with our previously proposed stacked bottleneck features, which provide wide linguistic context. Both objective and subjective evaluation results confirm the effectiveness of the proposed training criterion for improving model accuracy and naturalness of synthesised speech.

Year	Venue	Keywords
2015	16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5	Speech synthesis, acoustic model, deep neural network, trajectory error
Field	DocType	Citations
Bottleneck,Speech synthesis,Pattern recognition,Computer science,Naturalness,Mean squared error,Speech recognition,Parametric statistics,Artificial intelligence,Estimation theory,Trajectory,Acoustic model	Conference	3
PageRank	References	Authors
0.38	15	2

Authors (2 rows)

Cited by (3 rows)

References (15 rows)

Name	Order	Citations	PageRank
Zhizheng Wu	1	565	35.23
Simon King	2	19	5.11

1