Title
Workload characterization in a high-energy data grid and impact on resource management
Abstract
The analysis of data usage in a large set of real traces from a high-energy physics collaboration revealed the existence of an emergent grouping of files that we coined “filecules”. This paper presents the benefits of using this file grouping for prestaging data and compares it with previously proposed file grouping techniques along a range of performance metrics. Our experiments with real workloads demonstrate that filecule grouping is a reliable and useful abstraction for data management in science Grids; that preserving time locality for data prestaging is highly recommended; that job reordering with respect to data availability has significant impact on throughput; and finally, that a relatively short history of traces is a good predictor for filecule grouping. Our experimental results provide lessons for workload modeling and suggest design guidelines for data management in data-intensive resource-sharing environments.
Year
DOI
Venue
2009
https://doi.org/10.1007/s10586-009-0081-3
Cluster Computing
Keywords
Field
DocType
Grid computing,Workload characterization,Data management
Resource management,Data mining,Locality,Grid computing,Data analysis,Workload,Computer science,Data grid,Real-time computing,Throughput,Data management,Database
Journal
Volume
Issue
ISSN
12
2
1386-7857
Citations 
PageRank 
References 
10
0.61
41
Authors
3
Name
Order
Citations
PageRank
Adriana Iamnitchi12547222.35
Shyamala Doraimani2331.86
Gabriele Garzoglio3566.60