A combination of speaker normalization and speech rate normalization for automatic speech recognition - Citegraph

Paper Info

Title
A combination of speaker normalization and speech rate normalization for automatic speech recognition

Abstract
In this contribution a normalization procedure for automatic speech recognition is introduced which aims at reducing speaking rate specific variations of the features of the phonetic classes. A "spurtwise" calculation of normalization factors allows to capture changes of the speaking rate within one utterance. The cost- saving implementation using linear interpolation of the original features and a word graph rescoring procedure leads to a moder- ate increase in computational load compared to the baseline sys- tem without speech rate normalization. In addition a two-step procedure which combines vocal tract length normalization (VTLN) and speech rate normalization (SRN) has been developed. Experiments showed, that applying SRN to a VTLN-based recognition system leads to relative re- duction in word error rate of 4.2%. This is comparable to the decrease observed when using SRN on a system without VTLN. All in all the combination of VTLN and SRN results in a 15% reduction of word error rate compared to the baseline system.

Year	Venue	Keywords
2000	INTERSPEECH	linear interpolation,automatic speech recognition,word error rate
Field	DocType	Citations
Normalization (statistics),Recognition system,Pattern recognition,Computer science,Voice activity detection,Word error rate,Utterance,Speech recognition,Artificial intelligence,Linear interpolation,Baseline system,Vocal tract	Conference	7
PageRank	References	Authors
0.59	15	3

Authors (3 rows)

Cited by (7 rows)

References (15 rows)

Name	Order	Citations	PageRank
Thilo Pfau	1	113	15.74
Robert Faltlhauser	2	26	3.62
Günther Ruske	3	154	36.13

1