Title
A Logic for Document Spanners.
Abstract
Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which formalize the query language AQL that is used in IBM’s SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018), there is a connection between core spanners and \(\phantom {\dot {i}\!}\mathsf {EC}^{\text {reg}}\), the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining \(\phantom {\dot {i}\!}\mathsf {SpLog}\), a fragment of \(\phantom {\dot {i}\!}\mathsf {EC}^{\text {reg}}\) that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between \(\phantom {\dot {i}\!}\mathsf {SpLog}\) and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between \(\phantom {\dot {i}\!}\mathsf {SpLog}\) with negation and core spanners with a difference operator.
Year
DOI
Venue
2017
10.4230/LIPIcs.ICDT.2017.13
ICDT
Keywords
Field
DocType
Information extraction, Document spanners, Word equations, xregex, Descriptional complexity, CRPQs with string equality
Discrete mathematics,Combinatorics,Regular expression,Computer science,Succinctness,Automaton,Equivalence (measure theory),Concatenation,Pumping lemma for regular languages,Time complexity,Spanner
Conference
Volume
Issue
ISSN
68
7
1433-0490
Citations 
PageRank 
References 
6
0.45
0
Authors
1
Name
Order
Citations
PageRank
Dominik D. Freydenberger1899.14