Title
Iesgene-Zcpseknc: Identify Essential Genes Based On Z Curve Pseudo K-Tuple Nucleotide Composition
Abstract
As an important technique for synthetic biology, computational identification of essential genes will facilitate the development of the related fields, such as genome analysis, drug design, etc. The identification of prokaryotic essential genes has been extensively studied, especially focusing on the essential genes in bacteria. Archaea as an important domain in prokaryote exists high variance of genome sizes. However, there is no predictor available for predicting essential genes in archaea. In this paper, we developed the first computational predictor for predicting essential genes in archaea called iEsGene-ZCPseKNC. With the purpose of capturing sequence patterns of the essential genes, a new feature called Z curve pseudo k-tuple nucleotide composition (ZCPseKNC) was proposed, which incorporates the advantages of both Z curve and pseudo k-tuple nucleotide composition (PseKNC). In order to overcome the problems caused by the imbalanced training set, the SMOTE algorithm was employed to further improve the predictive performance of iEsGene-ZCPseKNC. Evaluated by the rigorous jackknife test on a benchmark dataset, the experimental results showed that the iEsGene-ZCPseKNC predictor outperformed the predictors based on Z curve and PseKNC, indicating that iEsGene-ZCPseKNC is useful for identification of essential genes in archaea, and would be a powerful tool for genome analysis. A user friendly web server of the iEsGene-ZCPseKNC predictor was established and can be easily accessed from http://bliulab.net/iEsGene-ZCPseKNC/.
Year
DOI
Venue
2019
10.1109/ACCESS.2019.2952237
IEEE ACCESS
Keywords
DocType
Volume
Essential gene prediction, ZCPseKNC, support vector machine, SMOTE
Journal
7
ISSN
Citations 
PageRank 
2169-3536
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Jiahai Chen100.34
Yongmin Liu200.34
Liao Qing31916.80
Bin Liu441933.30