Inefficiency Of K-Fac For Large Batch Size Training - Citegraph

Paper Info

Title
Inefficiency Of K-Fac For Large Batch Size Training

Abstract
There have been several recent work claiming record times for ImageNet training. This is achieved by using large batch sizes during training to leverage parallel resources to produce faster wall-clock training times per training epoch. However, often these solutions require massive hyper-parameter tuning, which is an important cost that is often ignored. In this work, we perform an extensive analysis of large batch size training for two popular methods that is Stochastic Gradient Descent (SGD) as well as Kronecker-Factored Approximate Curvature (K-FAC) method. We evaluate the performance of these methods in terms of both wall-clock time and aggregate computational cost, and study the hyper-parameter sensitivity by performing more than 512 experiments per batch size for each of these methods. We perform experiments on multiple different models on two datasets of CIFAR-10 and SVHN. The results show that beyond a critical batch size both K-FAC and SGD significantly deviate from ideal strong scaling behaviour, and that despite common belief K-FAC does not exhibit improved large-batch scalability behavior, as compared to SGD.

Year	Venue	DocType
2019	THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE	Journal
Volume	ISSN	Citations
34	2159-5399	0
PageRank	References	Authors
0.34	0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Linjian Ma	1	0	0.68
Gabe Montague	2	0	0.34
Jiayu Ye	3	1	0.68
Zhewei Yao	4	31	10.58
Amir Gholami	5	66	12.99
Kurt Keutzer	6	5040	801.67
Michael W. Mahoney	7	3297	218.10

1