Title
The SystemT IDE: an integrated development environment for information extraction rules
Abstract
Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.
Year
DOI
Venue
2011
10.1145/1989323.1989479
SIGMOD Conference
Keywords
Field
DocType
systemt ide,integrated development environment,high-quality information extraction rule,state-of-the-art rule-based ie system,enterprise application,rule-based ie system,computing data provenance,high-quality ie rule,high-quality ie,data management,semantic search,machine learning,regular expression,rule based,information extraction
Data mining,Regular expression,IBM,Business analytics,Suite,Semantic search,Iterative and incremental development,Computer science,Information extraction,Data management,Database
Conference
Citations 
PageRank 
References 
9
0.53
7
Authors
13
Name
Order
Citations
PageRank
Laura Chiticariu175741.60
Vivian Chu2744.67
Sajib Dasgupta329113.88
Thilo W. Goetz490.53
Howard Ho533719.47
Rajasekar Krishnamurthy6121471.86
Alexander Lang790.53
Yunyao Li853037.81
Bin Liu91387.54
Sriram Raghavan10109697.25
Frederick R. Reiss1137117.91
Shivakumar Vaithyanathan122518234.40
Huaiyu Zhu13623.08