Title
A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks
Abstract
Scientific applications frequently spend a large fraction of their execution time in reading and writing data on parallel file systems. Identifying these I/O performance bottlenecks and attributing root causes are critical steps toward devising optimization strategies. Several existing studies analyze I/O logs of a set of benchmarks or applications that were run with controlled behaviors. However, there is still a lack of general approach that systematically identifies I/O performance bottlenecks for applications running "in the wild" on production systems. In this study, we have developed an analysis approach of "zooming in" from platform-wide to application-wide to job-level I/O logs for identifying I/O bottlenecks in arbitrary scientific applications. We analyze the logs collected on a Cray XC40 system in production over a two-month period. This study results in several insights for application developers to use in optimizing I/O behavior.
Year
DOI
Venue
2019
10.1109/CCGRID.2019.00021
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
Keywords
Field
DocType
Darshan,Lustre Monitoring Tools,Slurm,IO Analysis,IO Trace
Computer science,Zoom,Input/output,Execution time,Distributed computing
Conference
ISSN
ISBN
Citations 
2376-4414
978-1-7281-0913-8
3
PageRank 
References 
Authors
0.38
9
7
Name
Order
Citations
PageRank
Teng Wang133642.78
Surendra Byna255139.65
Glenn K. Lockwood3204.06
Shane Snyder4648.38
Philip H. Carns596462.51
Sunggon Kim693.85
Nicholas J. Wright740827.79