Title | ||
---|---|---|
DIRAQ: scalable in situ data- and resource-aware indexing for optimized query performance |
Abstract | ||
---|---|---|
Scientific data analytics in high-performance computing environments has been evolving along with the advancement of computing capabilities. With the onset of exascale computing, the increasing gap between compute performance and I/O bandwidth has rendered the traditional post-simulation processing a tedious process. Despite the challenges due to increased data production, there exists an opportunity to benefit from "cheap" computing power to perform query-driven exploration and visualization during simulation time. To accelerate such analyses, applications traditionally augment, post-simulation, raw data with large indexes, which are then repeatedly utilized for data exploration. However, the generation of current state-of-the-art indexes involves a compute- and memory-intensive processing, thus rendering them inapplicable in an in situ context. In this paper we propose DIRAQ, a parallel in situ , in network data encoding and reorganization technique that enables the transformation of simulation output into a query-efficient form, with negligible runtime overhead to the simulation run. DIRAQ's effective core-local, precision-based encoding approach incorporates an embedded compressed index that is 3---6 $$\times $$ smaller than current state-of-the-art indexing schemes. Its data-aware index adjustmentation improves performance of group-level index layout creation by up to 35 % and reduces the size of the generated index by up to 27 %. Moreover, DIRAQ's in network index merging strategy enables the creation of aggregated indexes that speed up spatial-context query responses by up to $$10\times $$ 10 versus alternative techniques. DIRAQ's topology-, data-, and memory-aware aggregation strategy results in efficient I/O and yields overall end-to-end encoding and I/O time that is less than that required to write the raw data with MPI collective I/O. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/s10586-014-0358-z | Cluster Computing |
Keywords | Field | DocType |
Exascale computing,Indexing,Query processing,Compression | Exascale computing,Data analysis,Computer science,Visualization,Search engine indexing,Real-time computing,Rendering (computer graphics),Scalability,Speedup,Encoding (memory) | Journal |
Volume | Issue | ISSN |
17 | 4 | 1386-7857 |
Citations | PageRank | References |
3 | 0.37 | 27 |
Authors | ||
9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sriram Lakshminarasimhan | 1 | 187 | 10.01 |
Xiaocheng Zou | 2 | 64 | 5.90 |
David A. Boyuka II | 3 | 82 | 5.52 |
Saurabh V. Pendse | 4 | 48 | 3.33 |
John Jenkins | 5 | 56 | 6.72 |
Venkatram Vishwanath | 6 | 507 | 47.27 |
Michael E. Papka | 7 | 953 | 138.69 |
Scott Klasky | 8 | 1547 | 99.00 |
Nagiza F. Samatova | 9 | 861 | 74.04 |