Novel Applications Of Neural Networks In Speech Technology Systems: Search Space Reduction And Prosodic Modeling - Citegraph

Paper Info

Title
Novel Applications Of Neural Networks In Speech Technology Systems: Search Space Reduction And Prosodic Modeling

Abstract
Neural networks (NNs) have been extensively used in speech technology systems. In this paper. we present two novel applications of NNs in speech recognition and text-to-speech systems.In very large vocabulary speech recognition systems using the hypothesis-veri fit cation paradigm, the verification stage is usually the most time consuming. State of the art systems combine fixed size hypothesized search spaces with advanced pruning techniques. We propose a novel strategy to dynamically calculate the hypothesized search space, using neural networks as the estimation module and designing the input feature set with a careful greedy-based selection approach. The main achievement has been a statistically significant relative decrease in error rate of 33.53%, while getting a relative decrease in average computational demands of up to 19.40%.The prosodic modeling is one of the most important tasks for developing a new text-to-speech synthesizer, especially in a female-voice high-quality restricted-domain system. Our double objective is to get accurate predictors for both the fundamental frequency (F0) curve and phoneme duration by minimizing the model estimation error in a Spanish text-to-speech system, by means of a neural network estimator, which has proved to be an excellent tool for the modeling. The resulting system predicts prosody with very good results (for duration: 15.5 ms in RMS and a correlation factor of 0.8975: for F0: 19.80 Hz in RMS and a relative RMS error of 0.43) that clearly improves our previous rule-based system.

Year	Venue	Keywords
2009	INTELLIGENT AUTOMATION AND SOFT COMPUTING	Speech recognition, neural networks, search space reduction, hypothesis-verification systems, greedy methods, feature set selection, prosody, F0 modeling, duration modeling, text-to-speech, parameter coding
Field	DocType	Volume
Prosody,Fundamental frequency,Computer science,Time delay neural network,Artificial intelligence,Artificial neural network,Speech technology,Speech synthesis,Pattern recognition,Speech recognition,Root-mean-square deviation,Machine learning,Estimator	Journal	15
Issue	ISSN	Citations
4	1079-8587	0
PageRank	References	Authors
0.34	7	10

Authors (10 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
J. MACIAS-GUARASA	1	33	4.51
Juan Manuel Montero	2	218	31.51
J. FERREIROS	3	112	14.84
Ricardo De Córdoba	4	142	25.58
R. SAN-SEGUNDO	5	139	14.28
J. GUTIERREZ-ARRIOLA	6	2	0.70
L. F. D'HARO	7	33	2.83
F. FERNANDEZ	8	0	0.34
R. BARRA	9	1	0.69
J. M. PARDO	10	0	0.34

1