Abstract | ||
---|---|---|
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3× and 7.8× on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6× with respect to checkpointing directly to slower storage on our benchmark environment. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/PDSW-DISCS.2018.00011 | 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS) |
Keywords | DocType | Volume |
Training,Pipelines,Checkpointing,Prefetching,Benchmark testing,Google | Conference | abs/1810.03035 |
Citations | PageRank | References |
8 | 0.48 | 0 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Steven Wei Der Chien | 1 | 35 | 3.24 |
Stefano Markidis | 2 | 207 | 28.78 |
Chaitanya Prasad Sishtla | 3 | 9 | 0.85 |
Luís Santos | 4 | 110 | 14.58 |
Pawel Herman | 5 | 8 | 1.50 |
Sai Narasimhamurthy | 6 | 10 | 1.54 |
Erwin Laure | 7 | 369 | 44.71 |