Title
Studying the role of pitch-adaptive spectral estimation and speaking-rate normalization in automatic speech recognition.
Abstract
In the context of automatic speech recognition (ASR) systems, the front-end acoustic features should not be affected by signal periodicity (pitch period). Motivated by this fact, we have studied the role of pitch-synchronous spectrum estimation approach, referred to as TANDEM STRAIGHT, in this paper. TANDEM STRAIGHT results in a smoother spectrum devoid of pitch harmonics to a large extent. Consequently, the acoustic features derived using the smoothed spectra outperform the conventional Mel-frequency cepstral coefficients (MFCC). The experimental evaluations reported in this paper are performed on speech data from a wide range of speakers belonging to different age groups including children. The proposed features are found to be effective for all groups of speakers. To further improve the recognition of children's speech, the effect of vocal-tract length normalization (VTLN) is studied. The inclusion of VTLN further improves the recognition performance. We have also performed a detailed study on the effect of speaking-rate normalization (SRN) in the context of children's speech recognition. An SRN technique based on the anchoring of glottal closure instants estimated using zero-frequency filtering is explored in this regard. SRN is observed to be highly effective for child speakers belonging to different age groups. Finally, all the studied techniques are combined for effective mismatch reduction. In the case of children's speech test set, the use of proposed features results in a relative improvement of 21.6% over the MFCC features even after combining VTLN and SRN.
Year
DOI
Venue
2018
10.1016/j.dsp.2018.05.003
Digital Signal Processing
Keywords
Field
DocType
Pitch-adaptive spectral estimation,TANDEM STRAIGHT,Vocal-tract length normalization,Speaking-rate normalization,Glottal closure instants,Zero-frequency filtering
Mel-frequency cepstrum,Spectral density estimation,Normalization (statistics),Age groups,Pattern recognition,Pitch period,Filter (signal processing),Speech recognition,Artificial intelligence,Mathematics,Test set
Journal
Volume
ISSN
Citations 
79
1051-2004
3
PageRank 
References 
Authors
0.38
19
5
Name
Order
Citations
PageRank
S. Shahnawazuddin16417.34
Adiga, N.2103.60
Hemant Kumar Kathania3194.27
G. Pradhan48813.14
Rohit Sinha523130.54