Title
Towards a control-theory approach for minimizing unused grid resources
Abstract
HPC systems are facing more and more variability in their behavior, related to e.g., performance and power consumption, and the fact that they are less predictable requires more runtime management. This can be done in an Autonomic Management feedback loop, in response to monitored information in the systems, by analysis of this data and utilization of the results in order to activate appropriate system-level or application-level feedback mechanisms (e.g., informing schedulers, down-clocking CPUs). One such problem is found in the context of CiGri, a simple, lightweight, scalable and fault tolerant grid system which exploits the unused resources of a set of computing clusters. Computing power left over by the execution of a main HPC application scheduling is used to execute smaller jobs, which are injected as much as the global system allows. This paper presents first results addressing the problem of automated resource management in an HPC infrastructure, using techniques from Control Theory to design a controller that maximizes cluster utilization while avoiding overload. We put in place a mechanism for feedback (Proportional Integral, PI) control system software, through a maximum number of jobs to be sent to the cluster, in response to system information about the current number of jobs processed.
Year
DOI
Venue
2018
10.1145/3217197.3217201
PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON AUTONOMOUS INFRASTRUCTURE FOR SCIENCE (AI-SCIENCE 2018)
Keywords
DocType
ISBN
High performance computing, resource management, self-adaptive systems, autonomic computing, control theory
Conference
978-1-4503-5862-0
Citations 
PageRank 
References 
1
0.38
0
Authors
6
Name
Order
Citations
PageRank
Emmanuel Stahl110.38
Agustín Gabriel Yabo210.38
Olivier Richard340922.76
Bruno Bzeznik450.85
Bogdan Robu5143.41
Éric Rutten625530.50