Improving children's mismatched ASR using structured low-rank feature projection. - Citegraph

Paper Info

Title
Improving children's mismatched ASR using structured low-rank feature projection.

Abstract
The work presented in this paper explores the issues in automatic speech recognition (ASR) of children’s speech on acoustic models trained on adults’ speech. In such contexts, due to a large acoustic mismatch between training and test data, highly degraded recognition rates are noted. Even with the use of vocal tract length normalization (VTLN), the mismatched case recognition performance is still much below that for the matched case. Our earlier studies have shown that, for commonly used mel-filterbank-based cepstral features, the acoustic mismatch is exacerbated by insufficient smoothing of pitch harmonics for child speakers. To address this problem, a structured low-rank projection of the features vectors prior to learning the acoustic models as well as before decoding is proposed in this paper. To accomplish this, first a low-rank transform is learned on the training data (adults’ speech). Any dimensionality reduction technique which depends on the variance of the training data may be used for this purpose. In this work, principal component analysis and heteroscedastic linear discriminant analysis have been explored for the same. When the derived low-rank projection is applied in the mismatched testing case, it alleviates the pitch-dependent mismatch. The proposed approach provides a relative recognition performance improvement of 35% over the VTLN included baseline for the children’s mismatched ASR employing acoustic modeling based on hidden Markov models (HMM) with observation densities modeled using Gaussian mixture models (GMM). In addition to that, other acoustic modeling approaches based on subspace GMM (SGMM) and deep neural networks (DNN) have also been explored. Projecting the data to a lower-dimensional subspace is found to be effective in those frameworks as well. In the case of SGMM and DNN-based systems, the proposed approach is noted to result in relative recognition performance improvements of 33% and 21%, respectively, over their corresponding baselines.

Year	DOI	Venue
2018	10.1016/j.specom.2018.11.001	Speech Communication
Keywords	Field	DocType
Children’s speech recognition,Pitch variation,Low-rank feature projection,PCA,HLDA,SGMM,DNN	Normalization (statistics),Dimensionality reduction,Pattern recognition,Subspace topology,Computer science,Cepstrum,Speech recognition,Smoothing,Test data,Artificial intelligence,Hidden Markov model,Mixture model	Journal
Volume	ISSN	Citations
105	0167-6393	2
PageRank	References	Authors
0.45	32	4

Authors (4 rows)

Cited by (2 rows)

References (32 rows)

Name	Order	Citations	PageRank
S. Shahnawazuddin	1	64	17.34
Hemant Kumar Kathania	2	19	4.27
Abhishek Dey	3	5	0.85
Rohit Sinha	4	231	30.54

1