Title
Characterizing Scientific Workflows on HPC Systems using Logs
Abstract
Scientific advances depend on the ability to effectively and efficiently use high performance computing (HPC) systems to manage and run large, complex scientific workflows. Towards understanding the characteristics of these large scientific workflows, we propose two methods to identify workflows with temporal connections and data-dependencies from batch queue and I/O logs available at HPC systems. We use the two methods to characterize and correlate workflow runtime with node requests, I/O patterns, and resource usage on three months of log data available for Cori, a supercomputer at NERSC. A key result from our analyses shows that single-job workflows often do not use all allocated CPUs that provides opportunities to consider allocating resources at a finer-granularity.
Year
DOI
Venue
2020
10.1109/WORKS51914.2020.00013
2020 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)
Keywords
DocType
ISBN
log data,single-job workflows,HPC systems,high performance computing systems,scientific workflows,temporal connections,data-dependencies,batch queue,workflow runtime,CPUs,supercomputer,NERSC,Cori,I/O patterns,node requests
Conference
978-1-6654-0453-2
Citations 
PageRank 
References 
1
0.35
0
Authors
7
Name
Order
Citations
PageRank
Devarshi Ghoshal1578.83
Brian Austin2335.23
Deborah Bard3140.92
Chris Daley4315.02
Glenn K. Lockwood581.85
Nicholas J. Wright640827.79
lavanya ramakrishnan771056.18