Title | ||
---|---|---|
A combination of speaker normalization and speech rate normalization for automatic speech recognition |
Abstract | ||
---|---|---|
In this contribution a normalization procedure for automatic speech recognition is introduced which aims at reducing speaking rate specific variations of the features of the phonetic classes. A "spurtwise" calculation of normalization factors allows to capture changes of the speaking rate within one utterance. The cost- saving implementation using linear interpolation of the original features and a word graph rescoring procedure leads to a moder- ate increase in computational load compared to the baseline sys- tem without speech rate normalization. In addition a two-step procedure which combines vocal tract length normalization (VTLN) and speech rate normalization (SRN) has been developed. Experiments showed, that applying SRN to a VTLN-based recognition system leads to relative re- duction in word error rate of 4.2%. This is comparable to the decrease observed when using SRN on a system without VTLN. All in all the combination of VTLN and SRN results in a 15% reduction of word error rate compared to the baseline system. |
Year | Venue | Keywords |
---|---|---|
2000 | INTERSPEECH | linear interpolation,automatic speech recognition,word error rate |
Field | DocType | Citations |
Normalization (statistics),Recognition system,Pattern recognition,Computer science,Voice activity detection,Word error rate,Utterance,Speech recognition,Artificial intelligence,Linear interpolation,Baseline system,Vocal tract | Conference | 7 |
PageRank | References | Authors |
0.59 | 15 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thilo Pfau | 1 | 113 | 15.74 |
Robert Faltlhauser | 2 | 26 | 3.62 |
Günther Ruske | 3 | 154 | 36.13 |