Title
Flux: Overcoming scheduling challenges for exascale workflows
Abstract
Many emerging scientific workflows that target high-end HPC systems require complex interplay with the resource and job management software (RJMS). However, portable, efficient and easy-to-use scheduling and execution of these workflows is still an unsolved problem. We present Flux, a novel, hierarchical RJMS infrastructure that addresses the key scheduling challenges of modern workflows in a scalable, easy-to-use, and portable manner. At the heart of Flux lies its ability to be seamlessly nested within batch allocations created by other schedulers as well as itself. Once a hierarchy of Flux instances is created within each allocation, its consistent and rich set of well-defined APIs portably and efficiently support those workflows that can often feature non-traditional execution patterns such as requirements for complex co-scheduling, massive ensembles of small jobs and coordination among jobs in an ensemble. Our evaluation of Flux on some of the emerging workflow efforts at Lawrence Livermore National Laboratory indicates that our approach can significantly address major workflow scheduling challenges: job throughput, co-scheduling, job coordination and communication and portability challenges. Further, our performance evaluation on both synthetic and real-world ensemble-based workflows suggest that our solution can improve the job throughput performance of these scientific workflows by a factor of 48.
Year
DOI
Venue
2020
10.1016/j.future.2020.04.006
Future Generation Computer Systems
DocType
Volume
ISSN
Journal
110
0167-739X
Citations 
PageRank 
References 
2
0.38
0
Authors
12
Name
Order
Citations
PageRank
Dong H. Ahn132522.61
Ned Bass220.38
Albert Chu320.38
Jim Garlick4171.43
Mark Grondona5171.43
Stephen Herbein6305.55
Helgi I. Ingólfsson721.05
Joseph Koning820.38
Tapasya Patki91378.98
Thomas Scogland10858.24
Becky Springmeyer11171.43
michela taufer1235253.04