Title
Reducing Fragmentation on Torus-Connected Supercomputers
Abstract
Torus-based networks are prevalent on leadership-class petascale systems, providing a good balance between network cost and performance. The major disadvantage of this network architecture is its susceptibility to fragmentation. Many studies have attempted to reduce resource fragmentation in this architecture. Although the approaches suggested can make good allocation decisions reducing fragmentation at job start time, none of them considers a job's wall time, which can cause resource fragmentation when neighboring jobs do not complete closely. In this paper, we propose a wall time-aware job allocation strategy, which adjacently packs jobs that finish around the same time, in order to minimize resource fragmentation caused by job length, discrepancy. Event-driven simulations using real job traces from a production Blue Gene/P system at Argonne National Laboratory demonstrate that our wall time-aware strategy can effectively reduce system fragmentation and improve overall system performance.
Year
DOI
Venue
2011
10.1109/IPDPS.2011.82
IPDPS
Keywords
Field
DocType
neighboring job,real job trace,resource fragmentation,system fragmentation,leadership-class petascale system,torus-connected supercomputers,wall time-aware job allocation,p system,job start time,overall system performance,job length,computer architecture,discrete event simulation,system performance,resource management,cobalt,network architecture,scheduling,three dimensional,resource manager
Resource management,Scheduling (computing),Computer science,Parallel computing,Network architecture,Torus,Fragmentation (computing),Network cost,Petascale computing,Distributed computing,Discrete event simulation
Conference
Citations 
PageRank 
References 
18
0.72
22
Authors
5
Name
Order
Citations
PageRank
Wei Tang115210.65
Zhiling Lan281854.25
Narayan Desai331929.73
Daniel Buettner41156.33
Yongen Yu5804.55