Asynchronous stochastic gradient descent for DNN training - Citegraph

Paper Info

Title
Asynchronous stochastic gradient descent for DNN training

Abstract
It is well known that state-of-the-art speech recognition systems using deep neural network (DNN) can greatly improve the system performance compared with conventional GMM-HMM. However, what we have to pay correspondingly is the immense training cost due to the enormous parameters of DNN. Unfortunately, it is difficult to achieve parallelization of the minibatch-based back-propagation (BP) algorithm used in DNN training because of the frequent model updates. In this paper we describe an effective approach to achieve an approximation of BP - asynchronous stochastic gradient descent (ASGD), which is used to parallelize computing on multi-GPU. This approach manages multiple GPUs to work asynchronously to calculate gradients and update the global model parameters. Experimental results show that it achieves a 3.2 times speed-up on 4 GPUs than the single one, without any recognition performance loss.

Year	DOI	Venue
2013	10.1109/ICASSP.2013.6638950	ICASSP
Keywords	Field	DocType
asynchronous sgd,deep neural network,stochastic processes,global model parameters,speech recognition,multigpu,gpu parallelization,backpropagation,dnn training,gradient methods,asynchronous stochastic gradient descent,neural nets,servers,computational modeling,data models	Data modeling,Computer science,Server,Artificial intelligence,Artificial neural network,Asynchronous communication,Stochastic gradient descent,Pattern recognition,Parallel computing,Stochastic process,Backpropagation,Machine learning,Global model	Conference
Volume	Issue	ISSN
null	null	1520-6149
Citations	PageRank	References
46	2.35	2
Authors
5

Authors (5 rows)

Cited by (46 rows)

References (2 rows)

Name	Order	Citations	PageRank
Shanshan Zhang	1	53	4.24
Ce Zhang	2	50	3.17
Zhao You	3	67	9.39
Rong Zheng	4	50	3.50
Bo Xu	5	130	9.43

1