Title
Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data
Abstract
Modern large-scale scientific simulations running on HPC systems generate data in the order of terabytes during a single run. To lessen the I/O load during a simulation run, scientists are forced to capture data infrequently, thereby making data collection an inherently lossy process. Yet, lossless compression techniques are hardly suitable for scientific data due to its inherently random nature; for the applications used here, they offer less than 10% compression rate. They also impose significant overhead during decompression, making them unsuitable for data analysis and visualization that require repeated data access. To address this problem, we propose an effective method for In-situ Sort-And-B-spline Error-bounded Lossy Abatement (ISABELA) of scientific data that is widely regarded as effectively incompressible. With ISABELA, we apply a preconditioner to seemingly random and noisy data along spatial resolution to achieve an accurate fitting model that guarantees a ≥ 0.99 correlation with the original data. We further take advantage of temporal patterns in scientific data to compress data by ≈ 85%, while introducing only a negligible overhead on simulations in terms of runtime. ISABELA significantly outperforms existing lossy compression methods, such as Wavelet compression. Moreover, besides being a communication-free and scalable compression technique, ISABELA is an inherently local decompression method, namely it does not decode the entire data, making it attractive for random access.
Year
DOI
Venue
2011
10.1007/978-3-642-23400-2_34
Euro-Par (1)
Keywords
Field
DocType
original data,data collection,data analysis,spatio-temporal data,entire data,data infrequently,in-situ reduction,compress data,noisy data,scientific data,data access,wavelet compression
Data compression ratio,Lossy compression,Computer science,Parallel computing,Temporal database,Data compression,Data access,Distributed computing,Wavelet transform,Random access,Lossless compression
Conference
Volume
ISSN
Citations 
6852
0302-9743
56
PageRank 
References 
Authors
2.30
10
7
Name
Order
Citations
PageRank
Sriram Lakshminarasimhan118710.01
Neil Shah232324.15
Stephane Ethier329131.10
Scott Klasky4154799.00
Rob Latham523611.68
Robert Ross62717173.13
Nagiza F. Samatova786174.04