Title
Annotating gene sets by mining large literature collections with protein networks.
Abstract
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
Year
Venue
Keywords
2018
Biocomputing-Pacific Symposium on Biocomputing
text mining,functional annotations,knowledge network,gene interactions
Field
DocType
Volume
Data science,Text mining,Gene,Biology,Bioinformatics
Conference
23
ISSN
Citations 
PageRank 
2335-6936
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Sheng Wang1498.26
Jianzhu Ma265.27
Michael Ku Yu300.34
Fan Zheng400.68
Edward Huang522.39
Jiawei Han6430853824.48
Peng, Jian743050.07
Trey Ideker81360112.95