Title
MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs.
Abstract
S-sulfenylation (S-sulphenylation, or sulfenic acid), the covalent attachment of S-hydroxyl (-SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S-hydroxylation. Therefore, discriminating the substrate site of S-sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S-sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S-sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S-sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs.
Year
DOI
Venue
2016
10.1093/bioinformatics/btv558
BIOINFORMATICS
Field
DocType
Volume
Substrate (chemistry),Data mining,Sulfenic acid,Binary classification,Computer science,Amino acid composition,Support vector machine,Amino Acid Motifs,Computational biology,Decomposition,Protein structure
Journal
32
Issue
ISSN
Citations 
2
1367-4803
5
PageRank 
References 
Authors
0.42
16
4
Name
Order
Citations
PageRank
Van-Minh Bui1110.86
Cheng-Tsung Lu250.42
Thi-Trang Ho350.42
Tzong-Yi Lee461737.18