Title | ||
---|---|---|
Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation. |
Abstract | ||
---|---|---|
Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios.
In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas.
Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.
|
Year | DOI | Venue |
---|---|---|
2017 | 10.1145/3132701 | ACM Trans. Database Syst. |
Keywords | Field | DocType |
Regular expressions, XML, word membership | Data mining,Regular expression,Programming language,XML,Polynomial,Computer science,Theoretical computer science,XML schema,Concatenation,Operator (computer programming),Time complexity,Document type definition | Journal |
Volume | Issue | ISSN |
42 | 4 | 0362-5915 |
Citations | PageRank | References |
3 | 0.37 | 27 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dario Colazzo | 1 | 283 | 34.21 |
Giorgio Ghelli | 2 | 1300 | 255.19 |
Carlo Sartiani | 3 | 240 | 28.54 |