Title
Assessing the Impact of Concurrent Replication with Canceling in Parallel Jobs
Abstract
Parallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.
Year
DOI
Venue
2014
10.1109/MASCOTS.2014.13
MASCOTS
Keywords
DocType
ISSN
parallel processing,response time,scientific computing,job level,resource utilization,task cancellation mechanism,job reliability,explicit analysis,concurrency control,task level,replication mechanism,resource allocation,response time reduction,software applications,numerically-efficient algorithm,parallel job replication,grid data centers,stochastic model,general arrival process handling,cloud computing,concurrent replica processing,parallel job processing,performance metrics,iterative methods,large-scale instances,concurrent replication,parallel task failure,parallelization,vectors,generators,reliability,computational modeling
Conference
1526-7539
Citations 
PageRank 
References 
9
0.53
15
Authors
2
Name
Order
Citations
PageRank
Zhan Qiu1394.51
Juan F. Pérez210611.80