Title
Deep Convolutional Neural Networks for Large-scale Speech Tasks.
Abstract
Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations which exist in signals. Since speech signals exhibit both of these properties, we hypothesize that CNNs are a more effective model for speech compared to Deep Neural Networks (DNNs). In this paper, we explore applying CNNs to large vocabulary continuous speech recognition (LVCSR) tasks. First, we determine the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks. Specifically, we focus on how many convolutional layers are needed, what is an appropriate number of hidden units, what is the best pooling strategy. Second, investigate how to incorporate speaker-adapted features, which cannot directly be modeled by CNNs as they do not obey locality in frequency, into the CNN framework. Third, given the importance of sequence training for speech tasks, we introduce a strategy to use ReLU+dropout during Hessian-free sequence training of CNNs. Experiments on 3 LVCSR tasks indicate that a CNN with the proposed speaker-adapted and ReLU+dropout ideas allow for a 12%–14% relative improvement in WER over a strong DNN system, achieving state-of-the art results in these 3 tasks.
Year
DOI
Venue
2015
10.1016/j.neunet.2014.08.005
Neural Networks
Keywords
Field
DocType
Deep learning,Neural networks,Speech recognition
Locality,Convolutional neural network,Computer science,Pooling,Speech recognition,Artificial intelligence,Deep learning,Artificial neural network,Vocabulary,Machine learning,Deep neural networks
Journal
Volume
Issue
ISSN
64
1
0893-6080
Citations 
PageRank 
References 
89
3.39
26
Authors
7
Name
Order
Citations
PageRank
Tara N. Sainath13497232.43
B. Kingsbury24175335.43
George Saon382580.99
Hagen Soltau479567.33
Abdel-rahman Mohamed53772266.13
George E. Dahl64734416.42
Bhuvana Ramabhadran71779153.83