Abstract | ||
---|---|---|
Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running "in the wild" on production systems. In this study, we have developed an analysis approach of "zooming in" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/CCGRID.2019.00021 | 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) |
Keywords | Field | DocType |
Darshan,Lustre Monitoring Tools,Slurm,IO Analysis,IO Trace | Computer science,Zoom,Input/output,Execution time,Distributed computing | Conference |
ISSN | ISBN | Citations |
2376-4414 | 978-1-7281-0913-8 | 3 |
PageRank | References | Authors |
0.38 | 9 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Teng Wang | 1 | 336 | 42.78 |
Surendra Byna | 2 | 551 | 39.65 |
Glenn K. Lockwood | 3 | 20 | 4.06 |
Shane Snyder | 4 | 64 | 8.38 |
Philip H. Carns | 5 | 964 | 62.51 |
Sunggon Kim | 6 | 9 | 3.85 |
Nicholas J. Wright | 7 | 408 | 27.79 |