Title
Identification of Genetic Causality Statements in Medline Abstracts Leveraging Distant Supervision
Abstract
In the era of precision medicine, the clinical utility of next generation sequencing technology highly depends on the ability of interpreting the causality association of genetic variants and phenotyping which can be a labor intensive process. There are various resources available for cataloging such associations such as HGMD or ClinVar. Given the exponential growth in literature in the field, it is desired to accelerate the process by automatically identifying genetic causality statements from literature. Here, we define the task of identifying the statements as a classification task for sentences containing gene and disease entities. We used the cancer gene census available at the Catalogue of Somatic Mutations in Cancer (COSMIC) and to generate a weakly labeled data set for our classification task. We evaluated multiple feature sets such as: words, bi-grams, word embedding, and several machine-learning methods and showed the weighted F-measure around 95%. Evaluation using the top 50 genetic variant disease sentences demonstrated that the proposed method can identify genetic causality statements.
Year
DOI
Venue
2018
10.1109/ICHI-W.2018.00008
2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W)
Keywords
Field
DocType
cancer,disease,causality,genetic variant,distance supervision,classification,Semantic Medline,MutD,ClinVar
Causality,Precision medicine,Task analysis,Unified Modeling Language,Computer science,Natural language processing,Cataloging,Artificial intelligence,Word embedding,MEDLINE,Semantics
Conference
ISBN
Citations 
PageRank 
978-1-5386-6778-1
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Liwei Wang16310.92
Majid Rastegar-Mojarad29617.23
Ravikumar Komandur Elayavilli3347.04
Yanshan Wang44719.00
Hongfang Liu51479160.66