Title
Bastion3: a two-layer ensemble predictor of type III secreted effectors.
Abstract
Motivation Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. Results In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. Availability and implementation http://bastion3.erc.monash.edu/ Contact Supplementary information Supplementary data are available at Bioinformatics online.
Year
DOI
Venue
2019
10.1093/bioinformatics/bty914
BIOINFORMATICS
Field
DocType
Volume
Data mining,Computer science,Effector,Computational biology
Journal
35
Issue
ISSN
Citations 
12
1367-4803
2
PageRank 
References 
Authors
0.37
8
14
Name
Order
Citations
PageRank
jiawei wang13711.22
Jiahui Li220.37
Bingjiao Yang3231.82
Ruopeng Xie452.44
Tatiana T. Marquez-Lago5779.01
André Leier619719.87
Morihiro Hayashida715421.88
Tatsuya Akutsu82169216.05
Yanju Zhang9112.89
Kuo-Chen Chou1094664.26
Joel Selkrig1120.37
Tieli Zhou1220.71
Jiangning Song1337441.93
Trevor Lithgow14242.16