Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction - Citegraph

Paper Info

Title
Learning Ensembles of First-Order Clauses for Recall-Precision Curves: A Case Study in Biomedical Information Extraction

Abstract
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. Our research has focused on Information Extraction (IE), a task that typically involves many more negative examples than positive examples. IE is the process of finding facts in unstructured text, such as biomedical journals, and putting those facts in an organized system. In particular, we have focused on learning to recognize instances of the protein-localization relationship in Medline abstracts. We view the problem as a machine-learning task: given positive and negative extractions from a training corpus of abstracts, learn a logical theory that performs well on a held-aside testing set. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. We propose Gleaner, a randomized search method which collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an "at least N of these M clauses" thresholding method to combine the selected clauses. We compare Gleaner to ensembles of standard Aleph theories and find that Gleaner produces comparable testset results in a fraction of the training time needed for ensembles.

Year	DOI	Venue
2004	10.1007/978-3-540-30109-7_11	LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
Keywords	Field	DocType
first order,machine learning,information extraction,protein localization,spectrum	Inductive logic programming,Search algorithm,First order,Computer science,Precision and recall,Aleph,Information extraction,Artificial intelligence,Thresholding,Recall,Machine learning	Conference
Volume	ISSN	Citations
3194	0302-9743	18
PageRank	References	Authors
1.37	24	3

Authors (3 rows)

Cited by (18 rows)

References (24 rows)

Name	Order	Citations	PageRank
Mark Goadrich	1	1035	45.21
Louis Oliphant	2	43	3.44
Jude W. Shavlik	3	3057	619.89

1