Title
Recursive Programs for Document Spanners.
Abstract
A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are obtained by adding capture variables to regular expressions. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (extracting relations that play the role of EDBs from the input document). In this paper, we investigate the expressive power of recursive Datalog over regex formulas. Our main result is that such programs capture precisely the document spanners computable in polynomial time. Additional results compare recursive programs to known formalisms such as the language of core spanners (that extends regular spanners by allowing to test for string equality) and its closure under difference. Finally, we extend our main result to a recently proposed framework that generalizes both the relational model and document spanners.
Year
Venue
Field
2017
international conference on database theory
Data mining,Regular expression,Computer science,Theoretical computer science,Information extraction,Relational algebra,Spanner,Time complexity,Relational model,Datalog,Recursion
DocType
Volume
Citations 
Journal
abs/1712.08198
1
PageRank 
References 
Authors
0.35
15
4
Name
Order
Citations
PageRank
Liat Peterfreund1113.88
Balder Ten Cate263051.21
Ronald Fagin388082643.66
Benny Kimelfeld4103471.63