Characterizing Deep-Learning I/O Workloads in TensorFlow. - Citegraph

Paper Info

Title
Characterizing Deep-Learning I/O Workloads in TensorFlow.

Abstract
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3× and 7.8× on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6× with respect to checkpointing directly to slower storage on our benchmark environment.

Year	DOI	Venue
2018	10.1109/PDSW-DISCS.2018.00011	2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
Keywords	DocType	Volume
Training,Pipelines,Checkpointing,Prefetching,Benchmark testing,Google	Conference	abs/1810.03035
Citations	PageRank	References
8	0.48	0
Authors
7

Authors (7 rows)

Cited by (8 rows)

References (0 rows)

Name	Order	Citations	PageRank
Steven Wei Der Chien	1	35	3.24
Stefano Markidis	2	207	28.78
Chaitanya Prasad Sishtla	3	9	0.85
Luís Santos	4	110	14.58
Pawel Herman	5	8	1.50
Sai Narasimhamurthy	6	10	1.54
Erwin Laure	7	369	44.71

1