Abstract | ||
---|---|---|
Convolutional Neural Networks (CNNs) have demonstrated powerful acoustic modelling capabilities due to their ability to account for structural locality in the feature space; and in recent works CNNs have been shown to often outperform fully connected Deep Neural Networks (DNNs) on TIMIT and LVCSR. In this paper, we perform a detailed empirical study of CNNs under the low resource condition, wherein we only have 10 hours of training data. We find a two dimensional convolutional structure performs the best, and emphasize the importance to consider time and spectrum in modelling acoustic patterns. We report detailed error rates across a wide variety of model structures and show CNNs consistently outperform fully connected DNNs for this task. |
Year | Venue | Keywords |
---|---|---|
2015 | 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | Deep Neural Networks, Convolutional Neural Networks, Automatic Speech Recognition |
Field | DocType | ISSN |
Neocognitron,TIMIT,Feature vector,Pattern recognition,Computer science,Convolutional neural network,Speech recognition,Time delay neural network,Artificial intelligence,Deep learning,Artificial neural network,Hidden Markov model | Conference | 1520-6149 |
Citations | PageRank | References |
2 | 0.38 | 10 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
William Chan | 1 | 357 | 24.67 |
Ian R. Lane | 2 | 259 | 33.64 |