Title
Isolated Word Recognition with Audio Derivation and CNN
Abstract
We present a speaker-independent isolated word recognition approach with audio derivation and convolutional neural network(CNN) in this paper. In contrast with traditional sophisticated phonetic-based features extracted from audio, we utilize the spectrogram of audio as training data for convolutional neural network which transforms the isolated word recognition problem into the image recognition problem. Deep learning has high demands of training data, but it will reduce efficiency of the system to make such corpora. We present an audio-level data derivation approach, which makes it possible to obtain high recognition rate with a small number of audio seed data collected. It is achieved by formant perturbation, pitch shifting, time stretching and volume perturbation while maintaining semantic content. The approach presented in this paper reduces seed data amount demand of deep learning in isolated word recognition. Results show that accuracy improvement is significant with derived data and only 7.57%-15.14% of seed data is needed to achieve the same level accuracy.
Year
DOI
Venue
2017
10.1109/ICTAI.2017.00060
2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI)
Keywords
Field
DocType
convolutional neuralnetwork,audio derivation,limited training set,spectrogram,isolated word recognition
Audio time-scale/pitch modification,Pattern recognition,Computer science,Spectrogram,Convolutional neural network,Word recognition,Feature extraction,Artificial intelligence,Deep learning,Formant,Hidden Markov model
Conference
ISSN
ISBN
Citations 
1082-3409
978-1-5386-3877-4
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
jingjing zhang113919.09
Shuangjiu Xiao24114.18
Huichao Zhang300.34
Lan Jiang400.34