Abstract | ||
---|---|---|
Matching a regular expression (regex) on a text is widely used in many applications, such as text editing, information extraction and instruction detection (IDS). Traditional algorithms generally compile an equivalent automaton from the regex query, then run it on the text to find all matching results. However, they have to scale linearly with the size of the text. Recent algorithms utilize various filtering techniques to quickly jump to candidate positions in a text where a matching result may appear, then only these candidate positions are verified by the automaton. In this paper, we give a full specification on filtering techniques for the regex matching problem, in which filters for the regex query can be classified into positive factor and negative factor. We review three typical positive factors, including prefix, suffix, and necessary factor and show that negative factors can collaborate with positive factors to significantly improve the filtering ability. |
Year | Venue | Field |
---|---|---|
2018 | DASFAA Workshops | Data mining,Regular expression,Suffix,Computer science,Automaton,Filter (signal processing),Compiler,Prefix,Theoretical computer science,Information extraction,Jump |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
6 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tao Qiu | 1 | 18 | 4.42 |
Xiaochun Yang | 2 | 440 | 52.12 |
Bin Wang | 3 | 1788 | 246.68 |