Title | ||
---|---|---|
Predicting Pathogenic Non-Coding Variants On Imbalanced Data Set Using Cluster Ensemble Sampling |
Abstract | ||
---|---|---|
In the past few years, many variants in the non-coding regions of the human genome have been reported by personal whole genome sequencing. It is a challenge to distinguish pathogenic non-coding variants from such a large number of benign non-coding variants. Many machine learning methods for predicting pathogenic non-coding variants have been proposed. However, the precision and recall rates of the currently existing methods decline rapidly when the number of negative samples in the data increases. Both under- and over-sampling techniques have been employed in the field of machine learning to resolve the poor performance of classification methods on imbalanced data. Even though, we observed that a more sophisticated method with better performance is still largely desired for the problem of predicting pathogenic non-coding variants. In this regard, this study aims at presenting a general framework for imbalanced data learning, CE-SMURF, which incorporates both Cluster Ensemble (CE) sampling and hyper-ensemble techniques to further improve the prediction accuracy of detecting pathogenic non-coding variants. The results demonstrate that the final setting of CE-SMURF (f = 0, r = 0.1) is superior in training, and outperforms other existing methods on the testing data, providing a valuable insight to tackle the imbalanced learning issue for many future applications in the field of genomic precision medicine. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/BIBE.2019.00158 | 2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE) |
Keywords | Field | DocType |
imbalanced, non-coding, pathogenic variants, machine learning | Precision medicine,Computer science,Precision and recall,Coding (social sciences),Whole genome sequencing,Sampling (statistics),Test data,Artificial intelligence,Human genome,Machine learning | Conference |
ISSN | Citations | PageRank |
2471-7819 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kai-Wen Chuang | 1 | 0 | 0.34 |
Chien-Yu Chen | 2 | 367 | 29.24 |