Title
Two-Pass greedy regular expression parsing
Abstract
We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only kn bits of sequentially written and read log storage, where $k < \\frac{1}{3} m$ is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that our lean-log algorithm has also in practice superior performance and is surprisingly competitive with RE tools not performing full parsing, such as Grep.
Year
DOI
Venue
2013
10.1007/978-3-642-39274-0_7
CIAA
Keywords
DocType
Citations 
size n,kn bit,lean-log algorithm,two-pass greedy regular expression,time o,log storage,size m,greedy parses,compact bit-coded parse tree,new algorithm,input string
Conference
3
PageRank 
References 
Authors
0.39
10
4
Name
Order
Citations
PageRank
Niels Bjørn Bugge Grathwohl1112.21
Fritz Henglein268869.17
Lasse Nielsen3211.82
Ulrik Terp Rasmussen481.47