Partial data permutation for training deep neural networks - Citegraph

Paper Info

Title
Partial data permutation for training deep neural networks

Abstract
Random data permutation is considered as a best practice for training deep neural networks. When the input is large, permuting the full dataset is costly and limits scaling on distributed systems. Some practitioners use partial or no permutation that may potentially result in poor convergence.We propose a partitioned data permutation scheme as a low-cost alternative to full data permutation. Analyzing their entropy, we show that the two sampling schemes are asymptotically identical. We also show with minibatch SGD, both sampling schemes produce unbiased estimators of the true gradient. In addition, they have the same bound on the second moment of the gradient. Thus they have similar convergence properties. Our experiments confirm that SGD has similar training performance in practice with both sampling schemes.We further show that due to inherent randomness such as data augmentation and dropout in the training, even faster sampling schemes than partial permutation such as sequential sampling can achieve good performance. However, if no extra randomness is present in training, sampling schemes with low entropy can indeed degrade performance significantly.

Year	DOI	Venue
2020	10.1109/CCGrid49817.2020.00-17	2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
DocType	ISBN	Citations
Conference	978-1-7281-6095-5	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Guojing Cong	1	354	33.48
Li Zhang	2	2052	122.06
Chih-Chieh Yang	3	127	13.88

1