Title
Note on the greedy parsing optimality for dictionary-based text compression
Abstract
Dynamic dictionary-based compression schemes are the most daily used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, commonly referred to as LZ77. In dynamic setting, LZ77 considers a portion of the previous text as a dictionary and it uses a greedy approach to select, at each step, the longest match between the text and the dictionary. Compression is achieved by replacing matches with encoded dictionary pointers. LZ77 is the base of gzip, zip, rar, 7zip and many others compression software. All these compression schemes use variants of the greedy approach to parse (or factorise) the text into dictionary phrases. Greedy parsing optimality with respect to the number of phrases was proved by Storer et al. (1982) for unbounded LZ77-based dictionaries and by Cohn et al. (1996) for static suffix-closed dictionaries. The optimality of the greedy parsing was never proved for bounded size dictionary which is actually required by all of these practical schemes. In this article, we define the suffix-closed property for dynamic dictionaries, and we show that LZ77-based compression schemes, including the bounded dictionary variants, satisfy this property. Under this condition we prove the optimality of the greedy parsing as a variant of the proof by Cohn et al.
Year
DOI
Venue
2012
10.1016/j.tcs.2014.01.013
Theor. Comput. Sci.
Keywords
DocType
Volume
greedy approach,dictionary-based text compression,LZ77-based compression scheme,bounded size dictionary,compression software,bounded dictionary variant,greedy parsing,LZ77-based dictionary,dictionary phrase,greedy parsing optimality,data compression scheme,compression scheme
Journal
525,
ISSN
Citations 
PageRank 
0304-3975
6
0.51
References 
Authors
14
3
Name
Order
Citations
PageRank
Maxime Crochemore12655281.75
Alessio Langiu2466.52
Filippo Mignosi356999.71