Title
To Zip or not to Zip: effective resource usage for real-time compression
Abstract
Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponentially, but adding compression on the data path consumes scarce CPU and memory resources on the storage system. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield significant space savings when compressed. The first level of filtering that we employ is at the data set level (e.g., volume or file system), where we estimate the overall compressibility of the data at rest. According to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a second level of finer-grained filtering. The second filtering scheme examines data being written to the storage system in an online manner and determines its compressibility. The first-level filtering runs in mere minutes while providing mathematically proven guarantees on its estimates. In addition to aiding in selecting which volumes to compress, it has been released as a public tool, allowing potential customers to determine the effectiveness of compression on their data and to aid in capacity planning. The second-level filtering has shown significant CPU savings (up to 35%) while maintaining compression savings (within 2%).
Year
Venue
Keywords
2013
FAST
scarce cpu,real-time compression,disable compression,entire data,data path,compression saving,storage system,overall compressibility,effective resource usage,significant cpu saving,primary storage
Field
DocType
Citations 
Compressibility,Compression (physics),File system,Time compression,Computer data storage,Computer science,Filter (signal processing),Capacity planning,Real-time computing,Fold (higher-order function)
Conference
15
PageRank 
References 
Authors
1.45
8
5
Name
Order
Citations
PageRank
Danny Harnik148826.27
Ronen Kat2273.48
Oded Margalit322818.85
Dmitry Sotnikov4666.55
Avishay Traeger528116.07