Title
Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation.
Abstract
Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios. In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas. Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.
Year
DOI
Venue
2017
10.1145/3132701
ACM Trans. Database Syst.
Keywords
Field
DocType
Regular expressions, XML, word membership
Data mining,Regular expression,Programming language,XML,Polynomial,Computer science,Theoretical computer science,XML schema,Concatenation,Operator (computer programming),Time complexity,Document type definition
Journal
Volume
Issue
ISSN
42
4
0362-5915
Citations 
PageRank 
References 
3
0.37
27
Authors
3
Name
Order
Citations
PageRank
Dario Colazzo128334.21
Giorgio Ghelli21300255.19
Carlo Sartiani324028.54