Title
Predicting Pathogenic Non-Coding Variants On Imbalanced Data Set Using Cluster Ensemble Sampling
Abstract
In the past few years, many variants in the non-coding regions of the human genome have been reported by personal whole genome sequencing. It is a challenge to distinguish pathogenic non-coding variants from such a large number of benign non-coding variants. Many machine learning methods for predicting pathogenic non-coding variants have been proposed. However, the precision and recall rates of the currently existing methods decline rapidly when the number of negative samples in the data increases. Both under- and over-sampling techniques have been employed in the field of machine learning to resolve the poor performance of classification methods on imbalanced data. Even though, we observed that a more sophisticated method with better performance is still largely desired for the problem of predicting pathogenic non-coding variants. In this regard, this study aims at presenting a general framework for imbalanced data learning, CE-SMURF, which incorporates both Cluster Ensemble (CE) sampling and hyper-ensemble techniques to further improve the prediction accuracy of detecting pathogenic non-coding variants. The results demonstrate that the final setting of CE-SMURF (f = 0, r = 0.1) is superior in training, and outperforms other existing methods on the testing data, providing a valuable insight to tackle the imbalanced learning issue for many future applications in the field of genomic precision medicine.
Year
DOI
Venue
2019
10.1109/BIBE.2019.00158
2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)
Keywords
Field
DocType
imbalanced, non-coding, pathogenic variants, machine learning
Precision medicine,Computer science,Precision and recall,Coding (social sciences),Whole genome sequencing,Sampling (statistics),Test data,Artificial intelligence,Human genome,Machine learning
Conference
ISSN
Citations 
PageRank 
2471-7819
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Kai-Wen Chuang100.34
Chien-Yu Chen236729.24