Title
Disk cache-aware task scheduling for data-intensive and many-task workflow
Abstract
Workflow scheduling to maximize I/O performance is one of the key issues in data-intensive, many-task computing. In our previous work, we proposed locality-aware workflow scheduling method using the Multi-Constraint Graph Partitioning. In this work, we focus on read performance of input files from the disk cache (buffer cache or page cache on main memory). In order to maximize the disk cache hit rate of input files, a LIFO-order scheduling is effective since created intermediate files may be read soon. However, LIFO policy has a disadvantage of so-called “trailing task problem.” We propose a hybrid scheduling strategy of LIFO and HRF (Highest Rank First). In our strategy, one of two policies is applied depending on the number of highest-rank tasks in the queue to avoid the problem. In addition, scheduling for the overlap of computation and I/O is proposed. We implement our scheduling strategy for the Pwrake workflow system and the Gfarm distributed file system and evaluate it by executing data-intensive workflows using a computer cluster. Our scheduling strategy improves the performance of copyfile workflow by 30% due to increase in disk cache hit rate, and the performance of Montage workflow by 12% due to increase in core utilization.
Year
DOI
Venue
2014
10.1109/CLUSTER.2014.6968774
Cluster Computing
Keywords
Field
DocType
cache storage,distributed processing,scheduling,workflow management software,Gfarm distributed file system,HRF policy,LIFO-order scheduling,Pwrake workflow system,buffer cache,copyfile workflow,disk cache hit rate,disk cache-aware task scheduling,highest rank first policy,input-output performance,last-in first-out policy,many-task computing,multiconstraint graph partitioning,page cache,scheduling strategy,trailing task problem,workflow scheduling,distributed file system,many task computing,task scheduling,workflow system
I/O scheduling,Computer science,Cache,Two-level scheduling,Page cache,Real-time computing,Distributed computing,Fair-share scheduling,Parallel computing,Cache algorithms,Dynamic priority scheduling,Hybrid Scheduling,Operating system
Conference
ISSN
Citations 
PageRank 
1552-5244
5
0.54
References 
Authors
13
2
Name
Order
Citations
PageRank
Masahiro Tanaka1567.00
Osamu Tatebe230942.94