Title
Evaluation Of Phenotype Classification Methods For Obesity Using Direct To Consumer Genetic Data
Abstract
Direct-to-Consumer genetic testing services are becoming more ubiquitous. Consumers of such services are sharing their genetic and clinical information with the research community to facilitate the extraction of knowledge about different conditions. In this paper, we build on these services to analyse the genetic data of people with different BMI levels to determine the immediate and long-term risk factors associated with obesity. Using web scraping techniques, a dataset containing publicly available information about 230 participants from the Personal Genome Project is created. Subsequent analysis of the dataset is conducted for the identification of genetic variants associated with high BMI levels via standard quality control and association analysis protocols for Genome Wide Association Analysis. We applied a combination of Random Forest based feature selection algorithm and Support Vector Machine with Radial Basis Function Kernel learning method to the filtered dataset. Using a robust data science methodology our approach identified obesity related genetic variants, to be used as features when predicting individual obesity susceptibility. The results reveal that the subset of features obtained through the Random Forest based algorithm improve the performance of the classifier when compared to the top statistically significant genetic variants identified in logistic regression. Support Vector Machine showed the best results with sensitivity= 81%, specificity= 83% and area under the curve= 92% when the model was trained with the top fifteen features selected by Boruta.
Year
DOI
Venue
2017
10.1007/978-3-319-63312-1_31
INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II
Keywords
Field
DocType
Bioinformatics, Data science, Machine learning, Feature selection, Genetics, Obesity, SNPs
Genetic testing,Feature selection,Radial basis function kernel,Computer science,Support vector machine,Genetic association,Artificial intelligence,Random forest,Logistic regression,Machine learning,Personal Genome Project
Conference
Volume
ISSN
Citations 
10362
0302-9743
0
PageRank 
References 
Authors
0.34
3
6
Name
Order
Citations
PageRank
Casimiro Aday Curbelo Montañez100.34
Paul Fergus217333.04
Abir Jaafar Hussain318238.55
Dhiya Al-Jumeily411841.52
Mehmet Tevfik Dorak500.34
Rosni Abdullah615624.82