Title | ||
---|---|---|
PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Patterns |
Abstract | ||
---|---|---|
The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-level data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/CCGrid.2013.58 | CCGrid |
Keywords | Field | DocType |
optimisation,parallel processing,xml,data-intensive analytics,storage management,multivariate constraint,scientific data exploration,spatio-temporal constraint,parallel run-time layout optimization,high-performance parallel i/o middleware,adios,parlo,middleware,multilevel data layout optimization,storage system,large-scale hpc application,light-weight layout optimization,heterogeneous access patterns,xml-based configuration,indexes,writing,optimization,layout | Middleware,Data layout,XML,Computer data storage,Computer science,Parallel computing,Search engine indexing,Scientific database,Analytics,Management system,Distributed computing | Conference |
ISSN | ISBN | Citations |
2376-4414 | 978-1-4673-6465-2 | 11 |
PageRank | References | Authors |
0.50 | 17 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhenhuan Gong | 1 | 351 | 13.71 |
David A. Boyuka II | 2 | 82 | 5.52 |
Xiaocheng Zou | 3 | 64 | 5.90 |
Qing Liu | 4 | 389 | 25.62 |
Norbert Podhorszki | 5 | 1046 | 83.84 |
Scott Klasky | 6 | 1547 | 99.00 |
Xiaosong Ma | 7 | 1117 | 68.36 |
Nagiza F. Samatova | 8 | 861 | 74.04 |