Title
A suboptimal lossy data compression based on approximate pattern matching
Abstract
A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv (1977) data compression scheme. Among others we consider the typical length of an approximately repeated pattern within the first n positions of a stationary mixing sequence where D percent of mismatches is allowed. We prove that there exists a constant r0(D) such that the length of such an approximately repeated pattern converges in probability to 1/r0(D) log n (pr.) but it almost surely oscillates between 1/r-∞(D) log n and 2/r1(D) log n, where r -∞(D)>r0(D)>r1(D)/2 are some constants. These constants are natural generalizations of Renyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r0(D). We also establish the asymptotic behavior of the so-called approximate waiting time Nl which is defined as the time until a pattern of length C repeats approximately for the first time. We prove that log Nl/l→r0(D) (pr.) as l→∞. In general, r0(D)>R(D) where R(D) is the rate distortion function. Thus for stationary mixing sequences we settle in the negative the problem investigated by Steinberg and Gutman by showing that a lossy extension of the Wyner-Ziv (1989) scheme cannot be optimal
Year
DOI
Venue
1997
10.1109/18.623143
IEEE Transactions on Information Theory
Keywords
Field
DocType
approximation theory,data compression,encoding,entropy,pattern matching,probability,rate distortion theory,sequences,string matching,Renyi entropies,approximate pattern matching,approximate string matching,approximate waiting time,asymptotic behavior,compression ratio,lossless Lempel-Ziv data compression,lossy environment,probability,rate distortion function,stationary mixing sequence,stationary mixing sequences,suboptimal algorithm,suboptimal lossy data compression,variable source coding algorithm
Binary logarithm,Discrete mathematics,Combinatorics,Lossy compression,Approximate string matching,Almost surely,Data compression,Rate–distortion theory,Asymptotic analysis,Mathematics,Lossless compression
Journal
Volume
Issue
ISSN
43
5
0018-9448
Citations 
PageRank 
References 
52
4.27
23
Authors
3
Name
Order
Citations
PageRank
Luczak, T.1524.27
W. Szpankowski238131.59
Tomasz Łuczak322540.26