Automatic rule refinement for information extraction - Citegraph

Paper Info

Title
Automatic rule refinement for information extraction

Abstract
Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to determine the lineage of a tuple in a database, can be leveraged to assist in rule refinement. Specifically, given a set of extraction rules and correct and incorrect extracted data, we have developed a technique to suggest a ranked list of rule modifications that an expert rule specifier can consider. We implemented our technique in the SystemT information extraction system developed at IBM Research -- Almaden and experimentally demonstrate its effectiveness.

Year	DOI	Venue
2010	10.14778/1920841.1920916	PVLDB
Keywords	Field	DocType
rule modification,suitable information extraction rule,systemt information extraction system,extraction rule,data provenance,automatic rule refinement,rule-based information extraction,rules iteratively,expert rule specifier,unstructured text,rule refinement,information extraction,rule based	Data mining,Rule-based system,IBM,Specifier,Ranking,Information retrieval,Tuple,Computer science,Information extraction,Database	Journal
Volume	Issue	ISSN
3	1-2	2150-8097
Citations	PageRank	References
28	0.98	26
Authors
5

Authors (5 rows)

Cited by (28 rows)

References (26 rows)

Name	Order	Citations	PageRank
Bin Liu	1	138	7.54
Laura Chiticariu	2	757	41.60
Vivian Chu	3	74	4.67
H. V. Jagadish	4	11141	2495.67
Frederick R. Reiss	5	371	17.91

1