Title
Approximate Frequent Pattern Discovery In Compressed Space
Abstract
A grammar compression is a restricted context-free grammar (CFG) that derives a single string deterministically. The goal of a grammar compression algorithm is to develop a smaller CFG by finding and removing duplicate patterns, which is simply a frequent pattern discovery process. Any frequent pattern can be obtained in linear time; however, a huge working space is required for longer patterns, and the entire string must be preloaded into memory. We propose an online algorithm to address this problem approximately within compressed space. For an input sequence of symbols, a(1), a(2), . . ., let G(i) be a grammar compression for the string a(1)a(2) . . . a(i). In this study, an online algorithm is considered one that can compute G(i+1) from (G(i), a(i+1)) without explicitly decompressing G(i). Here, let G be a grammar compression for string S. We say that variable X approximates a substring P of S within approximation ratio delta iff for any interval [i, j] with P = S [i, j], the parse tree of G has a node labeled with X that derives S [l, r] for a subinterval [l, r] of [i, j] satisfying vertical bar[l, r]vertical bar >= delta vertical bar[ i, j]vertical bar. Then, G solves the frequent pattern discovery problem approximately within delta iff for any frequent pattern P of S, there exists a variable that approximates P within delta. Here, delta is called the approximation ratio of G for S. Previously, the best approximation ratio obtained by a polynomial time algorithm was Omega(1/lg(2) vertical bar P vertical bar). The main contribution of this work is to present a new lower bound Omega(1/lg* vertical bar S vertical bar lg vertical bar P vertical bar) that is smaller than the previous bound when lg* vertical bar S vertical bar < lg vertical bar P vertical bar. Experimental results demonstrate that the proposed algorithm extracts sufficiently long frequent patterns and significantly reduces memory consumption compared to the offline algorithm in the previous work.
Year
DOI
Venue
2018
10.1587/transinf.2017FCP0010
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords
Field
DocType
grammar compression, online algorithm, approximate frequent pattern discovery
Computer vision,Pattern recognition,Computer science,Artificial intelligence
Journal
Volume
Issue
ISSN
E101D
3
1745-1361
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Shouhei Fukunaga100.68
Yoshimasa Takabatake2297.27
Tomohiro I314822.06
Hiroshi Sakamoto4476.63