Abstract | ||
---|---|---|
We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where $k < \\frac{1}{3} m$ is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that our lean-log algorithm has also in practice superior performance and is surprisingly competitive with RE tools not performing full parsing, such as Grep. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1007/978-3-642-39274-0_7 | CIAA |
Keywords | DocType | Citations |
size n,kn bit,lean-log algorithm,two-pass greedy regular expression,time o,log storage,size m,greedy parses,compact bit-coded parse tree,new algorithm,input string | Conference | 3 |
PageRank | References | Authors |
0.39 | 10 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Niels Bjørn Bugge Grathwohl | 1 | 11 | 2.21 |
Fritz Henglein | 2 | 688 | 69.17 |
Lasse Nielsen | 3 | 21 | 1.82 |
Ulrik Terp Rasmussen | 4 | 8 | 1.47 |