Title
Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction
Abstract
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least N of these M clauses" thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles.
Year
DOI
Venue
2004
10.1007/978-3-540-30109-7_11
LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
Keywords
Field
DocType
first order,machine learning,information extraction,protein localization,spectrum
Inductive logic programming,Search algorithm,First order,Computer science,Precision and recall,Aleph,Information extraction,Artificial intelligence,Thresholding,Recall,Machine learning
Conference
Volume
ISSN
Citations 
3194
0302-9743
18
PageRank 
References 
Authors
1.37
24
3
Name
Order
Citations
PageRank
Mark Goadrich1103545.21
Louis Oliphant2433.44
Jude W. Shavlik33057619.89