Title
eCAMI: simultaneous classification and motif identification for enzyme annotation.
Abstract
Motivation: Carbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction. Results: This new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes.
Year
DOI
Venue
2020
10.1093/bioinformatics/btz908
BIOINFORMATICS
DocType
Volume
Issue
Journal
36
7
ISSN
Citations 
PageRank 
1367-4803
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Jing Xu102.03
Han Zhang275.29
Jinfang Zheng300.68
Philippe Dovoedo400.34
Yanbin Yin5317.75