Title
A Study on Acoustic Modeling for Child Speech Based on Multi-Task Learning
Abstract
This paper describes a study on acoustic modeling of child speech for large-vocabulary speech recognition of Cantonese. This study is driven and enabled by a new speech corpus recently collected for developing acoustic assessment systems for speech sound disorders in Cantonese-speaking children. The speech corpus, named CUChild127, contains 127 Chinese words spoken by 1, 500 pre-school children in Hong Kong. A small amount of manually transcribed child speech is used to initialize a GMM-HMM based speech recognition system, which is subsequently used to generate speech transcriptions for a large amount of training data. Multi-task learning approach is adopted to train a conventional DNN model and a timedelay neural network (TDNN) model. The primary and secondary tasks are context-dependent phone modeling for child speech and adult speech respectively. The training data of adult speech are obtained from an existing phonetically-rich speech corpus. Experimental results show that TDNN based acoustic model significantly outperforms DNN and GMM-HMM systems. Multi-task learning leads to further performance improvement of the TDNN model. The best syllable error rate attained in our experiments is 8.96%, with the weights of the primary and secondary tasks being 0.8 and 0.2.
Year
DOI
Venue
2018
10.1109/ISCSLP.2018.8706703
2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
Keywords
Field
DocType
Acoustics,Task analysis,Training,Speech recognition,Iterative decoding,Neural networks,Data models
Speech corpus,Data modeling,Multi-task learning,Computer science,Word error rate,Speech recognition,Time delay neural network,Syllable,Artificial neural network,Acoustic model
Conference
ISBN
Citations 
PageRank 
978-1-5386-5627-3
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Jiarui Wang173.48
Si Ioi Ng200.68
Dehua Tao301.01
Wing Yee Ng400.68
Tan Lee547674.69