Title
An approximation to the greedy algorithm for differential compression
Abstract
We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. The term “differential compression” refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear time, but they have certain drawbacks; namely, they do not find the best matches for every offset of the version file. Our algorithm, hsadelta (hash suffix array delta), finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The algorithm has two variations depending on how we choose the block size. We show that if the block size is kept fixed, the compression performance of the algorithm is similar to that of the greedy algorithm, without the associated expensive space and time requirements. If the block size is varied linearly with the reference file size, the algorithm can run in linear time and constant space. We also show empirically that the algorithm performs better than other state-of-the-art differential compression algorithms in terms of compression and is comparable in speed.
Year
DOI
Venue
2006
10.1147/rd.501.0149
IBM Journal of Research and Development
Keywords
Field
DocType
state-of-the-art differential compression algorithm,version file,best match,block size,reference file size,previous differential compression algorithm,greedy algorithm,linear time,reference file,new differential compression algorithm,compression algorithm
Block size,Computer science,Algorithm,File size,Suffix array,Greedy algorithm,Hash function,Time complexity,Data compression,Offset (computer science)
Journal
Volume
Issue
ISSN
50
1
0018-8646
Citations 
PageRank 
References 
5
0.48
14
Authors
4
Name
Order
Citations
PageRank
R. C. Agarwal120129.01
Gupta, K.2171.48
Jain, S.3131.08
Amalapurapu, S.450.48