Title
Towards Performant Workflows, Monitoring and Measuring
Abstract
As part of the U.S. National Science Foundation (NSF) funded XD Metrics Service project, we are developing tools and techniques for the audit and analysis of High Performance Computing (HPC) and Cloud infrastructure. This includes a suite of tools for the analysis of HPC jobs, based on performance metrics collected from compute nodes. To date, we have developed two closely related utilities: XDMoD, which was designed to monitor usage and performance of NSF’s innovative HPC resources (known as XSEDE), and Open XDMoD, which was designed to monitor usage and performance in academic, governmental or commercial cyberinfrastructures. Considerable effort has been made to continually improve XDMoD, in order to capture the most important aspects of modern research computing.One area in which XDMoD is lacking is in tracking workflows, which are broadly designated as containing the elements of data transfer/input and one to many computational steps. As data sets have become larger, data movement has become more time and resource intensive, and hence more important to characterize. In addition, multiple step workflows, in which one input spawns a complex series of processes, are becoming more common. Although XDMoD currently captures some of the information required to properly track complex workflows, there are clearly some key data that are missing. In this paper, we discuss the existing state of workflow monitoring, and suggest strategies to improve on the information captured.
Year
DOI
Venue
2020
10.1109/ICCCN49398.2020.9209647
2020 29th International Conference on Computer Communications and Networks (ICCCN)
Keywords
DocType
ISSN
Computer Performance,Data Processing
Conference
1095-2055
ISBN
Citations 
PageRank 
978-1-7281-6607-0
0
0.34
References 
Authors
0
9