Title
Efficient regular expression matching on LZ77 compressed strings using negative factors
Abstract
The state-of-the-art approaches for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as higher compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., the prefix and the suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant of suffix trie index, called SSLZ. We construct bitmaps for factors of regular expression to detect candidates. Moreover, due to the high space cost of SSLZ, we propose a variant index that partially maintain suffixes of the phrases with high frequency and develop an efficient regular expression algorithm based on the novel index, namely, RELZ+. In addition, two optimization strategies employing block filtering and LZ filtering are proposed to prune false negative candidates. At last, we conduct a comprehensive performance evaluation depending on four real data sets to validate our ideas and the proposed algorithms. The experimental results show that our RELZ and RELZ+ algorithms significantly outperform the existing algorithms.
Year
DOI
Venue
2019
10.1007/s11280-019-00667-z
World Wide Web
Keywords
Field
DocType
Regular expression, Compressed string, LZ77, Negative factor
Data mining,Regular expression,Substring,Suffix,Computer science,Algorithm,Filter (signal processing),Prefix,Compression ratio,Bitmap,Trie
Journal
Volume
Issue
ISSN
22
6
1573-1413
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Yutong Han100.34
Bin Wang2427.78
Xiaochun Yang344052.12
Tao Qiu4184.42
Huaijie Zhu500.34