Title
SNPboost: interaction analysis and risk prediction on GWA data
Abstract
Genome-wide association (GWA) studies, which typically aim to identify single nucleotide polymorphisms (SNPs) associated with a disease, yield large amounts of high-dimensional data. GWA studies have been successful in identifying single SNPs associated with complex diseases. However, so far, most of the identified associations do only have a limited impact on risk prediction. Recent studies applying SVMs have been successful in improving the risk prediction for Type I and II diabetes, however, a drawback is the poor interpretability of the classifier. Training the SVM only on a subset of SNPs would imply a preselection, typically by the p-values. Especially for complex diseases, this might not be the optimal selection strategy. In this work, we propose an extension of Adaboost for GWA data, the so-called SNPboost. In order to improve classification, SNPboost successively selects a subset of SNPs. On real GWA data (German MI family study II), SNPboost outperformed linear SVM and further improved the performance of a non-linear SVM when used as a preselector. Finally, we motivate that the selected SNPs can be put into a biological context.
Year
DOI
Venue
2011
10.1007/978-3-642-21738-8_15
ICANN (2)
Keywords
Field
DocType
non-linear svm,real gwa data,complex disease,high-dimensional data,gwa study,linear svm,risk prediction,selected snps,single snps,interaction analysis,gwa data,genome wide association
Interpretability,Disease,AdaBoost,Pattern recognition,Computer science,Support vector machine,Genome-wide association study,Single-nucleotide polymorphism,Artificial intelligence,Classifier (linguistics),Machine learning,Linear svm
Conference
Volume
ISSN
Citations 
6792
0302-9743
0
PageRank 
References 
Authors
0.34
4
3
Name
Order
Citations
PageRank
Ingrid Brænne110.69
Jeanette Erdmann230.85
Amir Madany Mamlouk3379.52