Variance Reduction in SGD by Distributed Importance Sampling - Citegraph

Paper Info

Title
Variance Reduction in SGD by Distributed Importance Sampling

Abstract
Humans are able to accelerate their learning by selecting training materials that are the most informative and at the appropriate level of difficulty. We propose a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling. This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient. We show experimentally that this method reduces gradient variance even in a context where the cost of synchronization across machines cannot be ignored, and where the factors for importance sampling are not updated instantly across the training set.

Year	Venue	Field
2015	CoRR	Training set,Data mining,Minimum-variance unbiased estimator,Importance sampling,Synchronization,Computer science,Sampling (statistics),Artificial intelligence,Deep learning,Variance reduction,Machine learning
DocType	Volume	Citations
Journal	abs/1511.06481	13
PageRank	References	Authors
0.66	6	5

Authors (5 rows)

Cited by (13 rows)

References (6 rows)

Name	Order	Citations	PageRank
Guillaume Alain	1	306	21.77
Alex Lamb	2	268	18.84
chinnadhurai sankar	3	32	4.47
Aaron C. Courville	4	6671	348.46
Yoshua Bengio	5	42677	3039.83

1