Abstract | ||
---|---|---|
Motivation: Carbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction. Results: This new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1093/bioinformatics/btz908 | BIOINFORMATICS |
DocType | Volume | Issue |
Journal | 36 | 7 |
ISSN | Citations | PageRank |
1367-4803 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jing Xu | 1 | 0 | 2.03 |
Han Zhang | 2 | 7 | 5.29 |
Jinfang Zheng | 3 | 0 | 0.68 |
Philippe Dovoedo | 4 | 0 | 0.34 |
Yanbin Yin | 5 | 31 | 7.75 |