Title
Asynchronous stochastic gradient descent for DNN training
Abstract
It is well known that state-of-the-art speech recognition systems using deep neural network (DNN) can greatly improve the system performance compared with conventional GMM-HMM. However, what we have to pay correspondingly is the immense training cost due to the enormous parameters of DNN. Unfortunately, it is difficult to achieve parallelization of the minibatch-based back-propagation (BP) algorithm used in DNN training because of the frequent model updates. In this paper we describe an effective approach to achieve an approximation of BP - asynchronous stochastic gradient descent (ASGD), which is used to parallelize computing on multi-GPU. This approach manages multiple GPUs to work asynchronously to calculate gradients and update the global model parameters. Experimental results show that it achieves a 3.2 times speed-up on 4 GPUs than the single one, without any recognition performance loss.
Year
DOI
Venue
2013
10.1109/ICASSP.2013.6638950
ICASSP
Keywords
Field
DocType
asynchronous sgd,deep neural network,stochastic processes,global model parameters,speech recognition,multigpu,gpu parallelization,backpropagation,dnn training,gradient methods,asynchronous stochastic gradient descent,neural nets,servers,computational modeling,data models
Data modeling,Computer science,Server,Artificial intelligence,Artificial neural network,Asynchronous communication,Stochastic gradient descent,Pattern recognition,Parallel computing,Stochastic process,Backpropagation,Machine learning,Global model
Conference
Volume
Issue
ISSN
null
null
1520-6149
Citations 
PageRank 
References 
46
2.35
2
Authors
5
Name
Order
Citations
PageRank
Shanshan Zhang1534.24
Ce Zhang2503.17
Zhao You3679.39
Rong Zheng4503.50
Bo Xu51309.43