Title
FTLLS: A fault tolerant, low latency, distributed scheduling approach based on sparrow.
Abstract
Big data processing systems are developing towards larger degrees of parallelism and shorter task durations in order to achieve lower response time. Scheduling highly parallel tasks that complete in sub-seconds poses a great challenge to traditional centralized schedulers. Taking the challenge, researchers turn to distributed scheduling approaches to avoid the throughput limitation of centralized schedulers, among which Sparrow is a leading design. However, little effort is devoted to the fault tolerance of Sparrow and there are problems with Sparrow’s sample-based techniques, which gives rise to incomplete jobs and large scheduling latency. We then present Fault Tolerant, Low Latency Sparrow (FTLLS). It extends Sparrow with an assistant machine to handle worker failures and to make better scheduling decisions. Through simulations, it is proved that FTLLS can detect worker failures more quickly than a naive timeout approach and make better scheduling decisions than native Sparrow. Through implementation, the results show that FTLLS guarantees no incomplete jobs at the presence of worker failures and reduces scheduling latencies by over 1.5 × when compared to native Sparrow. In addition, the simplicity of the idea adopted by FTLLS makes it applicable to a wide variety of distributed scheduling approaches.
Year
DOI
Venue
2018
10.1007/s12083-017-0590-4
Peer-to-Peer Networking and Applications
Keywords
Field
DocType
Fault tolerance, Low latency, Distributed scheduling, Sparrow, Big data
Fair-share scheduling,Latency (engineering),Scheduling (computing),Computer science,Computer network,Timeout,Fault tolerance,Latency (engineering),Throughput,Sparrow,Distributed computing
Journal
Volume
Issue
ISSN
11
5
1936-6442
Citations 
PageRank 
References 
0
0.34
13
Authors
2
Name
Order
Citations
PageRank
Wenzhuo Li1315.70
Chuang Lin23040390.74