Title
Planning spatial workflows to optimize grid performance
Abstract
In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms that are unaware of the application's file reference pattern, result in a huge number of redundant file transfers between grid sites and consequently perform poorly. This work presents a generalized approach to planning spatial workflow schedules for Grid execution based on the spatial proximity of files and the spatial range of jobs. We evaluate our solution to this problem using the file access pattern of an astronomy application that performs co-addition of images from the Sloan Digital Sky Survey. We show that, in initial tests on Grids of 5 to 25 sites, our spatial clustering approach eliminates 50% to 90% of the file transfers between Grid sites relative to the next-best planning algorithms we tested that were not "spatially aware". At moderate levels of concurrent file transfer, this reduction of redundant network I/O improves the application execution time by 30% to 70%, reduces Grid network and storage overhead and is broadly applicable to a wide range of spatially-oriented problems.
Year
DOI
Venue
2006
10.1145/1141277.1141456
SAC
Keywords
Field
DocType
file transfer,grid,performance,spatial workflow
Computer science,Grid network,Schedule,Grid file,File transfer,Execution time,Cluster analysis,Workflow,Database,Grid,Distributed computing
Conference
ISBN
Citations 
PageRank 
1-59593-108-2
15
0.90
References 
Authors
12
5
Name
Order
Citations
PageRank
Luiz Meyer1261.62
James Annis2150.90
Mike Wilde335122.09
Marta Mattoso41287109.83
Foster Ian5229382663.24