Title
An Evaluation of Communication Factors on an Adaptive Control Strategy for Job Co-allocation in Multiple HPC Clusters
Abstract
To more effectively use a network of high performance computing clusters, allocating multi-process jobs across multiple connected clusters, i.e., job co-allocation, offers the possibility of more efficient use of computer resources, reduced turn-around time and computations using numbers of processes larger than processors on any single cluster. Effective co-allocation, ultimately, depends on the inter-cluster communication cost. We previously introduced a scalable co-allocation strategy - maximum bandwidth adjacent cluster set (MBAS) strategy. It made use of two thresholds to control job co-allocation - one dealing with inter-cluster links and one controlling job partitioning. We subsequently introduced the adaptive threshold control system (ATCS), which used a fuzzy control approach to dynamically adjust these thresholds within MBAS. Results suggested that using ATCS during MBAS job co-allocation could achieve an overall performance improvement. However, these results only considered jobs that involved either master-slave or all-all communications among constituent processes. In this paper, we extend this analysis by also considering jobs that exhibit 2D-mesh communication patterns and evaluate ATCS further.
Year
DOI
Venue
2009
10.1109/ICPADS.2009.36
ICPADS
Keywords
Field
DocType
job partitioning,multi-process job,job co-allocation,multiple hpc clusters,all-all communication,controlling job partitioning,maximum bandwidth adjacent cluster set,communication factors,intercluster link,effective co-allocation,adaptive threshold control system,maximum bandwidth adjacent cluster,communication factor,efficient use,2d-mesh communication pattern,high performance computing,high-performance computing clusters,resource management,adaptive control,multiple hpc cluster,communication pattern,job coallocation,workstation clusters,fuzzy control,telecommunication control,mbas job co-allocation,adaptive control strategy,scalable coallocation strategy,scalable co-allocation strategy,control system,bandwidth,resource manager,control systems,adaptive thresholding,master slave
Resource management,Supercomputer,Computer science,Computer network,Real-time computing,Bandwidth (signal processing),Control system,Adaptive control,Fuzzy control system,Scalability,Distributed computing,Performance improvement
Conference
ISSN
ISBN
Citations 
1521-9097
978-1-4244-5788-5
1
PageRank 
References 
Authors
0.35
11
2
Name
Order
Citations
PageRank
Jinhui Qin1143.82
Michael A. Bauer233178.68