Title
Motifs tree: a new method for predicting post-translational modifications.
Abstract
Motivation: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N-alpha-terminal acetylation (N-Ac), two of the most common PTMs. Results: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models.
Year
DOI
Venue
2014
10.1093/bioinformatics/btu165
BIOINFORMATICS
Field
DocType
Volume
Data mining,Decision tree,Computer science,White box,Posttranslational modification,Artificial intelligence,Bioinformatics,Machine learning,Genetic algorithm
Journal
30
Issue
ISSN
Citations 
14
1367-4803
2
PageRank 
References 
Authors
0.47
5
4
Name
Order
Citations
PageRank
Christophe Charpilloz120.47
Anne-lise Veuthey232230.70
Bastien Chopard3503102.87
Jean-luc Falcone411216.94