Abstract | ||
---|---|---|
We present a speaker-independent isolated word recognition approach with audio derivation and convolutional neural network(CNN) in this paper. In contrast with traditional sophisticated phonetic-based features extracted from audio, we utilize the spectrogram of audio as training data for convolutional neural network which transforms the isolated word recognition problem into the image recognition problem. Deep learning has high demands of training data, but it will reduce efficiency of the system to make such corpora. We present an audio-level data derivation approach, which makes it possible to obtain high recognition rate with a small number of audio seed data collected. It is achieved by formant perturbation, pitch shifting, time stretching and volume perturbation while maintaining semantic content. The approach presented in this paper reduces seed data amount demand of deep learning in isolated word recognition. Results show that accuracy improvement is significant with derived data and only 7.57%-15.14% of seed data is needed to achieve the same level accuracy. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/ICTAI.2017.00060 | 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI) |
Keywords | Field | DocType |
convolutional neuralnetwork,audio derivation,limited training set,spectrogram,isolated word recognition | Audio time-scale/pitch modification,Pattern recognition,Computer science,Spectrogram,Convolutional neural network,Word recognition,Feature extraction,Artificial intelligence,Deep learning,Formant,Hidden Markov model | Conference |
ISSN | ISBN | Citations |
1082-3409 | 978-1-5386-3877-4 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
jingjing zhang | 1 | 139 | 19.09 |
Shuangjiu Xiao | 2 | 41 | 14.18 |
Huichao Zhang | 3 | 0 | 0.34 |
Lan Jiang | 4 | 0 | 0.34 |