Title
Improving the performance of keyword spotting system for children's speech through prosody modification.
Abstract
Searching for words of interest from a speech sequence is referred to as keyword spotting (KWS). A myriad of techniques have been proposed over the years for effectively spotting keywords from adults' speech. However, not much work has been reported on KWS for children's speech. The speech data for adult and child speakers differs significantly due to physiological differences between the two groups of speakers. Consequently, the performance of a KWS system trained on adults' speech degrades severely when used by children due to the acoustic mismatch. In this paper, we present our efforts towards improving the performance of keyword spotting systems for children's speech under limited data scenario. In this regard, we have explored prosody modification in order to reduce the acoustic mismatch resulting from the differences in pitch and speaking-rate. The prosody modification technique explored in this paper is the one based on glottal closure instant (GCI) events. The approach based on zero-frequency filtering (ZFF) is used to compute the GCI locations. Further, we have presented two different ways for effectively applying prosody modification. In the first case, prosody modification is applied to the children's speech test set prior to the decoding step in order to improve the recognition performance. Alternatively, we have also applied prosody modification to the training data from adult speakers. The original as well as the prosody modified adults' speech data are then augmented together before learning the statistical parameters of the KWS system. The experimental evaluations presented in this paper show that, significantly improved performances for children's speech are obtained by both of the aforementioned approaches of applying prosody modification. Prosody-modification-based data augmentation helps in improving the performance with respect to adults' speech as well.
Year
DOI
Venue
2019
10.1016/j.dsp.2018.12.011
Digital Signal Processing
Keywords
Field
DocType
Keyword spotting,Children's speech,Prosody modification,Glottal closure instants,Data augmentation
Training set,Prosody,Pattern recognition,Filter (signal processing),Speech recognition,Keyword spotting,Artificial intelligence,Decoding methods,Spotting,Mathematics,Test set
Journal
Volume
ISSN
Citations 
86
1051-2004
2
PageRank 
References 
Authors
0.37
24
3
Name
Order
Citations
PageRank
S. Shahnawazuddin16417.34
Karabi Maity220.37
G. Pradhan38813.14