Title
ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying.
Abstract
High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e. g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.
Year
DOI
Venue
2013
10.1007/978-3-642-41221-9_4
Lecture Notes in Computer Science
Field
DocType
Volume
Inverted index,Data mining,External Data Representation,Data analysis,Computer science,Range query (data structures),Search engine indexing,Compression ratio,Analytics,Lossless compression
Journal
8220
ISSN
Citations 
PageRank 
0302-9743
7
0.47
References 
Authors
15
13
Name
Order
Citations
PageRank
John Jenkins170.47
Isha Arkatkar2602.68
Sriram Lakshminarasimhan318710.01
David A. Boyuka II4825.52
Eric R. Schendel5615.02
Neil Shah632324.15
Stéphane Ethier7958.15
Choong-Seock Chang8726.56
Jackie Chen9804.62
Hemanth Kolla1025017.13
Scott Klasky11154799.00
Robert Ross122717173.13
Nagiza F. Samatova1386174.04