Using Deep Bidirectional Recurrent Neural Networks For Prosodic-Target Prediction In A Unit-Selection Text-To-Speech System - Citegraph

Paper Info

Title
Using Deep Bidirectional Recurrent Neural Networks For Prosodic-Target Prediction In A Unit-Selection Text-To-Speech System

Abstract
Deeply-stacked Bidirectional Recurrent Neural Networks (BiRNNs) are able to capture complex, short- and long-term, context dependencies between predictors and targets due to the non-linear dependency they introduce on the entire observation when predicting a target, thanks to the use of recurrent hidden layers that accumulate information from all preceding and future observations. This aspect of the model makes them desirable for tasks such as the prediction of prosodic contours for text-to-speech systems, where the surface prosody can be a result of the interaction between local and non-local features. Although previous work has demonstrated that they attain stateof-the-art performance for this task within a parametric synthesis framework, their use within unit-selection synthesis systems remains unexplored. In this work we deploy this class of models within a unit selection system, investigate their effect on the outcome of the unit search, and perceptually evaluate it against the baseline (decision-tree-based) approach.

Year	Venue	Keywords
2015	16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5	speech synthesis, unit selection, recurrent neural networks, deep learning
Field	DocType	Citations
Speech synthesis,Computer science,Recurrent neural network,Speech recognition	Conference	2
PageRank	References	Authors
0.43	6	4

Authors (4 rows)

Cited by (2 rows)

References (6 rows)

Name	Order	Citations	PageRank
Raul Fernandez	1	8	1.36
Asaf Rendel	2	38	3.08
Bhuvana Ramabhadran	3	1779	153.83
Ron Hoory	4	181	19.16

1