Title
Workload-Aware Scheduling for Data Analytics upon Heterogeneous Storage
Abstract
A trend in nowadays data centers is that equipped with SSD, HDD, etc., heterogeneous storage devices are widely deployed to meet diverse demands of various big data workloads. Since the reading performance of various storage devices are quite different, traditional concurrent data fetching easily incurs unbalanced use among devices. As a result, the straggler in terms of the data fetching, derived from the unbalanced use, directly increases the overall latency of data analytics. To avoid such unbalanced use on fetching large volume of data concurrently from storage devices, we formulate Workload-Aware Scheduling problem for Heterogeneous storage devices (WASH), the goal of which is to minimize the maximum data fetching time for parallel data analytical tasks. We design a randomized algorithm (rWASH) to select a proper source device for each task based on delicate calculated probabilities, which can be proved concentrated on its optimum with high probability. Extensive experiments show that rWASH reduces the average data fetching time for tasks by up to 55% over the state-of-the-art algorithms.
Year
DOI
Venue
2019
10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00088
2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
Keywords
DocType
ISBN
big data analytics, heterogeneous storage devices, workload aware scheduling
Conference
978-1-7281-4329-3
Citations 
PageRank 
References 
0
0.34
0
Authors
7
Name
Order
Citations
PageRank
Zhuzhong Qian138051.27
Yuan Gao200.34
Mingtao Ji300.68
Hui Peng400.34
Chen Peng510014.00
Yibo Jin653.78
Sanglu Lu71380144.07