Title
Using Deep Bidirectional Recurrent Neural Networks For Prosodic-Target Prediction In A Unit-Selection Text-To-Speech System
Abstract
Deeply-stacked Bidirectional Recurrent Neural Networks (BiRNNs) are able to capture complex, short- and long-term, context dependencies between predictors and targets due to the non-linear dependency they introduce on the entire observation when predicting a target, thanks to the use of recurrent hidden layers that accumulate information from all preceding and future observations. This aspect of the model makes them desirable for tasks such as the prediction of prosodic contours for text-to-speech systems, where the surface prosody can be a result of the interaction between local and non-local features. Although previous work has demonstrated that they attain stateof-the-art performance for this task within a parametric synthesis framework, their use within unit-selection synthesis systems remains unexplored. In this work we deploy this class of models within a unit selection system, investigate their effect on the outcome of the unit search, and perceptually evaluate it against the baseline (decision-tree-based) approach.
Year
Venue
Keywords
2015
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5
speech synthesis, unit selection, recurrent neural networks, deep learning
Field
DocType
Citations 
Speech synthesis,Computer science,Recurrent neural network,Speech recognition
Conference
2
PageRank 
References 
Authors
0.43
6
4
Name
Order
Citations
PageRank
Raul Fernandez181.36
Asaf Rendel2383.08
Bhuvana Ramabhadran31779153.83
Ron Hoory418119.16