Title
Learning phenotype mapping for integrating large genetic data
Abstract
Accurate phenotype mapping will play an important role in facilitating Phenome-Wide Association Studies (PheWAS), and potentially in other phenomics based studies. The Phe-WAS approach investigates the association between genetic variation and an extensive range of phenotypes in a high-throughput manner to better understand the impact of genetic variations on multiple phenotypes. Herein we define the phenotype mapping problem posed by PheWAS analyses, discuss the challenges, and present a machine-learning solution. Our key ideas include the use of weighted Jaccard features and term augmentation by dictionary lookup. When compared to string similarity metric-based features, our approach improves the F-score from 0.59 to 0.73. With augmentation we show further improvement in F-score to 0.89. For terms not covered by the dictionary, we use transitive closure inference and reach an F-score of 0.91, close to a level sufficient for practical use. We also show that our model generalizes well to phenotypes not used in our training dataset.
Year
Venue
Keywords
2011
BioNLP@ACL
dictionary lookup,phenotype mapping problem,practical use,phewas analysis,genetic variation,term augmentation,phe-was approach,phenome-wide association studies,accurate phenotype mapping,large genetic data,multiple phenotypes
Field
DocType
Citations 
Phenomics,Phenotype,Computer science,Inference,Genetic association,Jaccard index,Artificial intelligence,Natural language processing,Transitive closure,String metric,Machine learning
Conference
2
PageRank 
References 
Authors
0.40
12
6
Name
Order
Citations
PageRank
C. N. Hsu11233157.54
Cheng-Ju Kuo21968.26
Congxing Cai3934.76
Sarah A. Pendergrass4429.20
Marylyn D. Ritchie569286.79
José Luis Ambite6958110.89