Abstract | ||
---|---|---|
It is well known that state-of-the-art speech recognition systems using deep neural network (DNN) can greatly improve the system performance compared with conventional GMM-HMM. However, what we have to pay correspondingly is the immense training cost due to the enormous parameters of DNN. Unfortunately, it is difficult to achieve parallelization of the minibatch-based back-propagation (BP) algorithm used in DNN training because of the frequent model updates. In this paper we describe an effective approach to achieve an approximation of BP - asynchronous stochastic gradient descent (ASGD), which is used to parallelize computing on multi-GPU. This approach manages multiple GPUs to work asynchronously to calculate gradients and update the global model parameters. Experimental results show that it achieves a 3.2 times speed-up on 4 GPUs than the single one, without any recognition performance loss. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/ICASSP.2013.6638950 | ICASSP |
Keywords | Field | DocType |
asynchronous sgd,deep neural network,stochastic processes,global model parameters,speech recognition,multigpu,gpu parallelization,backpropagation,dnn training,gradient methods,asynchronous stochastic gradient descent,neural nets,servers,computational modeling,data models | Data modeling,Computer science,Server,Artificial intelligence,Artificial neural network,Asynchronous communication,Stochastic gradient descent,Pattern recognition,Parallel computing,Stochastic process,Backpropagation,Machine learning,Global model | Conference |
Volume | Issue | ISSN |
null | null | 1520-6149 |
Citations | PageRank | References |
46 | 2.35 | 2 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shanshan Zhang | 1 | 53 | 4.24 |
Ce Zhang | 2 | 50 | 3.17 |
Zhao You | 3 | 67 | 9.39 |
Rong Zheng | 4 | 50 | 3.50 |
Bo Xu | 5 | 130 | 9.43 |