Abstract | ||
---|---|---|
Accurate phenotype mapping will play an important role in facilitating Phenome-Wide Association Studies (PheWAS), and potentially in other phenomics based studies. The Phe-WAS approach investigates the association between genetic variation and an extensive range of phenotypes in a high-throughput manner to better understand the impact of genetic variations on multiple phenotypes. Herein we define the phenotype mapping problem posed by PheWAS analyses, discuss the challenges, and present a machine-learning solution. Our key ideas include the use of weighted Jaccard features and term augmentation by dictionary lookup. When compared to string similarity metric-based features, our approach improves the F-score from 0.59 to 0.73. With augmentation we show further improvement in F-score to 0.89. For terms not covered by the dictionary, we use transitive closure inference and reach an F-score of 0.91, close to a level sufficient for practical use. We also show that our model generalizes well to phenotypes not used in our training dataset. |
Year | Venue | Keywords |
---|---|---|
2011 | BioNLP@ACL | dictionary lookup,phenotype mapping problem,practical use,phewas analysis,genetic variation,term augmentation,phe-was approach,phenome-wide association studies,accurate phenotype mapping,large genetic data,multiple phenotypes |
Field | DocType | Citations |
Phenomics,Phenotype,Computer science,Inference,Genetic association,Jaccard index,Artificial intelligence,Natural language processing,Transitive closure,String metric,Machine learning | Conference | 2 |
PageRank | References | Authors |
0.40 | 12 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
C. N. Hsu | 1 | 1233 | 157.54 |
Cheng-Ju Kuo | 2 | 196 | 8.26 |
Congxing Cai | 3 | 93 | 4.76 |
Sarah A. Pendergrass | 4 | 42 | 9.20 |
Marylyn D. Ritchie | 5 | 692 | 86.79 |
José Luis Ambite | 6 | 958 | 110.89 |