Title
A Comprehensive Perspective on Pilot-Job Systems
Abstract
Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing.
Year
DOI
Venue
2018
10.1145/3177851
ACM Computing Surveys (CSUR)
Keywords
DocType
Volume
Distributed applications, Pilot-Jobs, distributed systems
Journal
51
Issue
ISSN
Citations 
2
0360-0300
5
PageRank 
References 
Authors
0.50
0
3
Name
Order
Citations
PageRank
Matteo Turilli18416.21
Mark Santcroos2708.11
Shantenu Jha318832.40