Title
KnowLife: a versatile approach for constructing a large knowledge graph for biomedical sciences.
Abstract
Biomedical knowledge bases (KB's) have become important assets in life sciences. Prior work on KB construction has three major limitations. First, most biomedical KBs are manually built and curated, and cannot keep up with the rate at which new findings are published. Second, for automatic information extraction (IE), the text genre of choice has been scientific publications, neglecting sources like health portals and online communities. Third, most prior work on IE has focused on the molecular level or chemogenomics only, like protein-protein interactions or gene-drug relationships, or solely address highly specific topics such as drug effects.We address these three limitations by a versatile and scalable approach to automatic KB construction. Using a small number of seed facts for distant supervision of pattern-based extraction, we harvest a huge number of facts in an automated manner without requiring any explicit training. We extend previous techniques for pattern-based IE with confidence statistics, and we combine this recall-oriented stage with logical reasoning for consistency constraint checking to achieve high precision. To our knowledge, this is the first method that uses consistency checking for biomedical relations. Our approach can be easily extended to incorporate additional relations and constraints. We ran extensive experiments not only for scientific publications, but also for encyclopedic health portals and online communities, creating different KB's based on different configurations. We assess the size and quality of each KB, in terms of number of facts and precision. The best configured KB, KnowLife, contains more than 500,000 facts at a precision of 93% for 13 relations covering genes, organs, diseases, symptoms, treatments, as well as environmental and lifestyle risk factors.KnowLife is a large knowledge base for health and life sciences, automatically constructed from different Web sources. As a unique feature, KnowLife is harvested from different text genres such as scientific publications, health portals, and online communities. Thus, it has the potential to serve as one-stop portal for a wide range of relations and use cases. To showcase the breadth and usefulness, we make the KnowLife KB accessible through the health portal (http://knowlife.mpi-inf.mpg.de).
Year
DOI
Venue
2015
10.1186/s12859-015-0549-5
BMC Bioinformatics
Keywords
Field
DocType
Biomedical text mining, Knowledge base, Relation extraction
Data science,Knowledge graph,Computer science,Information extraction,Software,Biomedical text mining,Knowledge base,Chemogenomics,Bioinformatics,Relationship extraction
Journal
Volume
Issue
ISSN
16
1
1471-2105
Citations 
PageRank 
References 
30
1.04
42
Authors
3
Name
Order
Citations
PageRank
Patrick Ernst1706.51
Amy Siu2301.04
Gerhard Weikum3127102146.01