Title
Adaptive parallel job scheduling with flexible coscheduling
Abstract
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these applications can suffer severe performance penalties if their processes are not all coscheduled to run together. Two common approaches to coscheduling jobs are batch scheduling, wherein nodes are dedicated for the duration of the run, and gang scheduling, wherein time slicing is coordinated across processors. Both work well when jobs are load-balanced and make use of the entire parallel machine. However, these conditions are rarely met and most realistic workloads consequently suffer from both internal and external fragmentation, in which resources and processors are left idle because jobs cannot be packed with perfect efficiency. This situation leads to reduced utilization and suboptimal performance. Flexible coscheduling (FCS) addresses this problem by monitoring each job's computation granularity and communication pattern and scheduling jobs based on their synchronization and load-balancing requirements. In particular, jobs that do not require stringent synchronization are identified, and are not coscheduled; instead, these processes are used to reduce fragmentation. FCS has been fully implemented on top of the STORM resource manager on a 256-processor alpha cluster and compared to batch, gang, and implicit coscheduling algorithms. This paper describes in detail the implementation of FCS and its performance evaluation with a variety of workloads, including large-scale benchmarks, scientific applications, and dynamic workloads. The experimental results show that FCS saturates at higher loads than other algorithms (up to 54 percent higher in some cases), and displays lower response times and slowdown than the other algorithms in nearly all scenarios.
Year
DOI
Venue
2005
10.1109/TPDS.2005.130
Parallel and Distributed Systems, IEEE Transactions
Keywords
Field
DocType
parallel architectures,parallel machines,processor scheduling,resource allocation,synchronisation,workstation clusters,batch scheduling,cluster computing,flexible coscheduling,gang scheduling,high-performance computing,job scheduling,load-balancing,parallel architecture,parallel machine,synchronization,time slicing,Cluster computing,flexible coscheduling.,gang scheduling,job scheduling,load balancing,parallel architectures
Computer science,Coscheduling,Load balancing (computing),Scheduling (computing),Parallel computing,Gang scheduling,Real-time computing,Schedule,Resource allocation,Job scheduler,Computer cluster,Distributed computing
Journal
Volume
Issue
ISSN
16
11
1045-9219
Citations 
PageRank 
References 
29
1.17
23
Authors
4
Name
Order
Citations
PageRank
Eitan Frachtenberg1106085.08
Feitelson, D.G.226522.76
Fabrizio Petrini32050165.82
Juan Fernandez426923.17