The Effect of Network Width on the Performance of Large-batch Training. - Citegraph

Paper Info

Title
The Effect of Network Width on the Performance of Large-batch Training.

Abstract
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training. Training with large batches can reduce these overheads; however it besets the convergence of the algorithm and the generalization performance.In this work, we take a first step towards analyzing how the structure (width and depth) of a neural network affects the performance of large-batch training. We present new theoretical results which suggest that--for a fixed number of parameters--wider networks are more amenable to fast large-batch training compared to deeper ones. We provide extensive experiments on residual and fully-connected neural networks which suggest that wider networks can be trained using larger batches without incurring a convergence slow-down, unlike their deeper variants.

Year	Venue	Keywords
2018	NeurIPS	neural networks,stochastic gradient descent,neural network,high frequency,first step
DocType	Volume	Citations
Conference	abs/1806.03791	1
PageRank	References	Authors
0.36	13	5

Authors (5 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Lingjiao Chen	1	22	3.34
Hongyi Wang	2	19	3.83
Zhao, Jinman	3	2	0.70
Dimitris S. Papailiopoulos	4	797	40.11
Paraschos Koutris	5	347	26.63

1