Title | ||
---|---|---|
MLOC: Multi-level Layout Optimization Framework for Compressed Scientific Data Exploration with Heterogeneous Access Patterns |
Abstract | ||
---|---|---|
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their runtime environments. The growing gap gets exacerbated by exploratory dataâ聙"intensive analytics, such as querying simulation data for regions of interest with multivariate, spatio-temporal constraints. Query-driven data exploration induces heterogeneous access patterns that further stress the performance of the underlying storage system. To partially alleviate the problem, data reduction via compression and multi-resolution data extraction are becoming an integral part of I/O systems. While addressing the data size issue, these techniques introduce yet another mix of access patterns to a heterogeneous set of possibilities. Moreover, how extreme-scale datasets are partitioned into multiple files and organized on a parallel file systems augments to an already combinatorial space of possible access patterns. To address this challenge, we present MLOC, a parallel Multilevel Layout Optimization framework for Compressed scientific spatio-temporal data at extreme scale. MLOC proposes multiple fine-grained data layout optimization kernels that form a generic core from which a broader constellation of such kernels can be organically consolidated to enable an effective data exploration with various combinations of access patterns. Specifically, the kernels are optimized for access patterns induced by (a) queryâ聙"driven multivariate, spatio-temporal constraints, (b) precisionâ聙"driven data analytics, (c) compressionâ聙"driven data reduction, (d) multi-resolution data sampling, and (e) multiâ聙"file data partitioning and organization on a parallel file system. MLOC organizes these optimization kernels within a multiâ聙"level architecture, on which all the levels can be flexibly re-ordered by userâ聙"defined priorities. When tested on queryâ聙"driven exploration of compressed data, MLOC demonstrates a superior performance compared to any state-of-the-art scientific database management technologies. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/ICPP.2012.39 | ICPP |
Keywords | Field | DocType |
multi-resolution data,compressed scientific spatio-temporal data,heterogeneous access patterns,driven data analytics,multi-level layout optimization framework,exploratory data,data size issue,driven data reduction,access pattern,query-driven data exploration,data reduction,compressed scientific data exploration,effective data exploration,data compression,distributed databases,kernel,optimization,organizations,sampling methods,layout,throughput,data models | Data modeling,Data mining,File system,Data analysis,Computer science,Computer data storage,Parallel computing,Data extraction,Distributed database,Analytics,Data compression,Distributed computing | Conference |
Citations | PageRank | References |
4 | 0.40 | 0 |
Authors | ||
9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhenhuan Gong | 1 | 351 | 13.71 |
Terry Rogers | 2 | 18 | 1.46 |
John Jenkins | 3 | 4 | 0.40 |
Hemanth Kolla | 4 | 250 | 17.13 |
Stephane Ethier | 5 | 291 | 31.10 |
Jackie Chen | 6 | 80 | 4.62 |
Robert Ross | 7 | 2717 | 173.13 |
Scott Klasky | 8 | 1547 | 99.00 |
Nagiza F. Samatova | 9 | 861 | 74.04 |