Title | ||
---|---|---|
The SystemT IDE: an integrated development environment for information extraction rules |
Abstract | ||
---|---|---|
Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1989323.1989479 | SIGMOD Conference |
Keywords | Field | DocType |
systemt ide,integrated development environment,high-quality information extraction rule,state-of-the-art rule-based ie system,enterprise application,rule-based ie system,computing data provenance,high-quality ie rule,high-quality ie,data management,semantic search,machine learning,regular expression,rule based,information extraction | Data mining,Regular expression,IBM,Business analytics,Suite,Semantic search,Iterative and incremental development,Computer science,Information extraction,Data management,Database | Conference |
Citations | PageRank | References |
9 | 0.53 | 7 |
Authors | ||
13 |
Name | Order | Citations | PageRank |
---|---|---|---|
Laura Chiticariu | 1 | 757 | 41.60 |
Vivian Chu | 2 | 74 | 4.67 |
Sajib Dasgupta | 3 | 291 | 13.88 |
Thilo W. Goetz | 4 | 9 | 0.53 |
Howard Ho | 5 | 337 | 19.47 |
Rajasekar Krishnamurthy | 6 | 1214 | 71.86 |
Alexander Lang | 7 | 9 | 0.53 |
Yunyao Li | 8 | 530 | 37.81 |
Bin Liu | 9 | 138 | 7.54 |
Sriram Raghavan | 10 | 1096 | 97.25 |
Frederick R. Reiss | 11 | 371 | 17.91 |
Shivakumar Vaithyanathan | 12 | 2518 | 234.40 |
Huaiyu Zhu | 13 | 62 | 3.08 |