Title
Performance under Failures of DAG-based Parallel Computing
Abstract
As the scale and complexity of parallel systems continue to grow, failures become more and more an inevitable fact for solving large-scale applications. In this research, we present an analytical study to estimate execution time in the presence of failures of directed acyclic graph (DAG) based scientific applications and provide a guideline for performance optimization. The study is four fold. We first introduce a performance model to predict individual subtask computation time under failures. Next, a layered, iterative approach is adopted to transform a DAG into a layered DAG, which reflects full dependencies among all the subtasks. Then, the expected execution time under failures of the DAG is derived based on stochastic analysis. Unlike existing models, this newly proposed performance model provides both the variance and distribution. It is practical and can be put to real use. Finally, based on the model, performance optimization, weak point identification and enhancement are proposed. Intensive simulations with real system traces are conducted to verify the analytical findings. They show that the newly proposed model and weak point enhancement mechanism work well.
Year
DOI
Venue
2009
10.1109/CCGRID.2009.55
CCGrid
Keywords
Field
DocType
real use,optimisation,execution time,parallel processing,layered dag,iterative approach,dag-based parallel computing,directed acyclic graph,real system trace,failuer modeling,parallel system complexity,applicaiton perfomrance,fault-tolerance,performance model,performance optimization,directed graphs,expected execution time,individual subtask computation time,analytical study,stochastic analysis,analytical finding,stochastic processes,data mining,parallel systems,computational modeling,probability density function,fault tolerant,fault tolerance,parallel computer,predictive models,accuracy,failure analysis,iterative methods,optimization
Iterative method,Computer science,Parallel computing,Directed graph,Stochastic process,Directed acyclic graph,Fault tolerance,Execution time,Probability density function,Distributed computing,Computation
Conference
ISBN
Citations 
PageRank 
978-0-7695-3622-4
7
0.53
References 
Authors
21
5
Name
Order
Citations
PageRank
Hui Jin1442.97
Xian-he Sun21987182.64
Ziming Zheng327213.57
Zhiling Lan481854.25
Bing Xie55211.39