Title
Lossless Compression of Time Series Data with Generalized Deduplication
Abstract
To provide compressed storage for large amounts of time series data, we present a new strategy for data deduplication. Rather than attempting to deduplicate entire data chunks, we employ a generalized approach, where each chunk is split into a part worth deduplicating and a part that must be stored directly. This simple principle enables a greater compression of the often similar, non-identical, chunks of time series data than is the case for classic deduplication, while keeping benefits such as scalability, robustness, and on-the-fly storage, retrieval, and search for chunks. We analyze the method's theoretical performance, and argue that our method can asymptotically approach the entropy limit for some data configurations. To validate the method's practical merits, we finally show that it is competitive when compared to popular universal compression algorithms on the MIT-BIH ECG Compression Test Database.
Year
DOI
Venue
2019
10.1109/GLOBECOM38437.2019.9013957
2019 IEEE Global Communications Conference (GLOBECOM)
Keywords
DocType
ISSN
entropy,data chunks,universal compression algorithms,compressed storage,generalized deduplication,lossless compression,MIT-BIH ECG compression test database,data configurations,time series data,data deduplication
Conference
1930-529X
ISBN
Citations 
PageRank 
978-1-7281-0963-3
0
0.34
References 
Authors
6
3
Name
Order
Citations
PageRank
Rasmus Vestergaard113.40
Qi Zhang2137.05
Daniel E. Lucani323642.29