Title | ||
---|---|---|
Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction |
Abstract | ||
---|---|---|
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least N of these M clauses" thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1007/978-3-540-30109-7_11 | LECTURE NOTES IN ARTIFICIAL INTELLIGENCE |
Keywords | Field | DocType |
first order,machine learning,information extraction,protein localization,spectrum | Inductive logic programming,Search algorithm,First order,Computer science,Precision and recall,Aleph,Information extraction,Artificial intelligence,Thresholding,Recall,Machine learning | Conference |
Volume | ISSN | Citations |
3194 | 0302-9743 | 18 |
PageRank | References | Authors |
1.37 | 24 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mark Goadrich | 1 | 1035 | 45.21 |
Louis Oliphant | 2 | 43 | 3.44 |
Jude W. Shavlik | 3 | 3057 | 619.89 |