Comparative Study of Distributed Deep Learning Tools on Supercomputers. - Citegraph

Paper Info

Title
Comparative Study of Distributed Deep Learning Tools on Supercomputers.

Abstract
With the growth of the scale of data set and neural networks, the training time is increasing rapidly. Distributed parallel training has been proposed to accelerate deep neural network training, and most efforts are made on top of GPU clusters. This paper focuses on the performance of distributed parallel training in CPU clusters of supercomputer systems. Using resources at the supercomputer system of “Tianhe-2”, we conduct extensive evaluation of the performance of popular deep learning tools, including Caffe, TensorFlow, and BigDL, and several deep neural network models are tested, including AutoEncoder, LeNet, AlexNet and ResNet. The experiment results show that Caffe performs the best in communication efficiency and scalability. BigDL is the fastest in computing speed benefiting from its optimization for CPU, but it suffers from long communication delay due to the dependency on MapReduce framework. The insights and conclusions from our evaluation provides significant reference for improving resource utility of supercomputer resources in distributed deep learning.

Year	Venue	Field
2018	ICA3PP	Tianhe-2,Autoencoder,Supercomputer,Computer science,Parallel computing,Caffè,Artificial intelligence,Deep learning,Artificial neural network,Scalability,Speedup
DocType	Citations	PageRank
Conference	1	0.34
References	Authors
10	7

Authors (7 rows)

Cited by (1 rows)

References (10 rows)

Name	Order	Citations	PageRank
Xin Du	1	127	26.78
Di Kuang	2	1	0.34
Yan Ye	3	54	12.55
Xinxin Li	4	27	8.16
Mengqiang Chen	5	1	0.68
Yunfei Du	6	72	14.62
Wei-Gang Wu	7	425	48.87

1