DSD: Dense-Sparse-Dense Training for Deep Neural Networks - Citegraph

Paper Info

Title
DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Abstract
Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1 accuracy of GoogLeNet by 1.1%, VGG-16 by 4.3%, ResNet-18 by 1.2% and ResNet-50 by 1.1%, respectively. On the WSJ’93 dataset, DSD improved DeepSpeech and DeepSpeech2 WER by 2.0% and 1.1%. On the Flickr-8K dataset, DSD improved the NeuralTalk BLEU score by over 1.7. DSD is easy to use in practice: at training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn’t change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD experiments shows the inadequacy of the current training methods for finding the best local optimum, while DSD effectively achieves superior optimization performance for finding a better solution. DSD models are available to download at https://songhan.github.io/DSD.

Year	Venue	Field
2017	international conference on learning representations	Pattern recognition,Local optimum,Inference,Computer science,Network architecture,Artificial intelligence,Contextual image classification,Machine learning,Deep neural networks
DocType	Citations	PageRank
Conference	14	0.59
References	Authors
15	12

Authors (12 rows)

Cited by (14 rows)

References (15 rows)

Name	Order	Citations	PageRank
Song Han	1	2102	79.81
Jeff Pool	2	70	5.19
Sharan Narang	3	335	14.44
Huizi Mao	4	1279	41.30
Enhao Gong	5	25	1.48
Tang S	6	70	8.37
Erich Elsen	7	551	29.33
Peter Vajda	8	80	5.14
Manohar Paluri	9	1237	56.52
John Tran	10	735	26.86
Bryan C. Catanzaro	11	1191	75.56
William J. Dally	12	11782	1460.14

1