Title | ||
---|---|---|
Novel Applications Of Neural Networks In Speech Technology Systems: Search Space Reduction And Prosodic Modeling |
Abstract | ||
---|---|---|
Neural networks (NNs) have been extensively used in speech technology systems. In this paper. we present two novel applications of NNs in speech recognition and text-to-speech systems.In very large vocabulary speech recognition systems using the hypothesis-veri fit cation paradigm, the verification stage is usually the most time consuming. State of the art systems combine fixed size hypothesized search spaces with advanced pruning techniques. We propose a novel strategy to dynamically calculate the hypothesized search space, using neural networks as the estimation module and designing the input feature set with a careful greedy-based selection approach. The main achievement has been a statistically significant relative decrease in error rate of 33.53%, while getting a relative decrease in average computational demands of up to 19.40%.The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the fundamental frequency (F0) curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system, by means of a neural network estimator, which has proved to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975: for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves our previous rule-based system. |
Year | Venue | Keywords |
---|---|---|
2009 | INTELLIGENT AUTOMATION AND SOFT COMPUTING | Speech recognition, neural networks, search space reduction, hypothesis-verification systems, greedy methods, feature set selection, prosody, F0 modeling, duration modeling, text-to-speech, parameter coding |
Field | DocType | Volume |
Prosody,Fundamental frequency,Computer science,Time delay neural network,Artificial intelligence,Artificial neural network,Speech technology,Speech synthesis,Pattern recognition,Speech recognition,Root-mean-square deviation,Machine learning,Estimator | Journal | 15 |
Issue | ISSN | Citations |
4 | 1079-8587 | 0 |
PageRank | References | Authors |
0.34 | 7 | 10 |
Name | Order | Citations | PageRank |
---|---|---|---|
J. MACIAS-GUARASA | 1 | 33 | 4.51 |
Juan Manuel Montero | 2 | 218 | 31.51 |
J. FERREIROS | 3 | 112 | 14.84 |
Ricardo De Córdoba | 4 | 142 | 25.58 |
R. SAN-SEGUNDO | 5 | 139 | 14.28 |
J. GUTIERREZ-ARRIOLA | 6 | 2 | 0.70 |
L. F. D'HARO | 7 | 33 | 2.83 |
F. FERNANDEZ | 8 | 0 | 0.34 |
R. BARRA | 9 | 1 | 0.69 |
J. M. PARDO | 10 | 0 | 0.34 |