Title
Design and analysis of fault tolerance mechanism for sparrow
Abstract
Big data processing frameworks are developing towards larger degrees of parallelism and shorter task durations in order to achieve lower response time. Scheduling highly parallel tasks that complete in nearly 100 milliseconds poses a major challenge for task schedulers. Taking the challenge, researchers turn to decentralized frameworks to relieve the pressure of task schedulers, among which Sparrow is a good choice. However, little efforts are devoted to fault tolerance of Sparrow, which does not handle worker failures, giving rise to incomplete tasks. We present a fault tolerance mechanism named Heartbeat on Sparrow to handle failures of worker machines. Through simulation, we compare it with a simple mechanism. The result shows that Heartbeat on Sparrow can detect worker failures faster and reschedule all failed tasks more efficiently, achieving recovery of tasks and states in sub-second time. We hope this mechanism will make some contributions to Sparrow and other decentralized designs on fault tolerance side.
Year
DOI
Venue
2014
10.1109/PCCC.2014.7017054
IPCCC
Keywords
Field
DocType
decentralized framework,decentralized design,parallel processing,failure dectector,scheduling,parallelism degree,fault tolerant computing,task schedulers,big data processing framework,heartbeat mechanism,decentralized task scheduling,failure recovery,sparrow,fault tolerance mechanism,big data,fault torlerance,heart rate variability,fault tolerance,detectors
Big data processing,Heartbeat,Heart beat,Computer science,Scheduling (computing),Response time,Software fault tolerance,Real-time computing,Fault tolerance,Sparrow,Distributed computing
Conference
ISSN
Citations 
PageRank 
1097-2641
1
0.36
References 
Authors
8
2
Name
Order
Citations
PageRank
Wenzhuo Li1315.70
Chuang Lin23040390.74