Title
On the limits of computational functional genomics for bacterial lifestyle prediction.
Abstract
We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.
Year
DOI
Venue
2014
10.1093/bfgp/elu014
BRIEFINGS IN FUNCTIONAL GENOMICS
Keywords
Field
DocType
bioinformatics,machine learning,actinobacteria,pathogenicity
Gene,Biology,Functional genomics,Statistical learning,Orthologous Gene,Computational biology,Bioinformatics,Pathogenicity,Genetics,Human pathogen,Sequence analysis
Conference
Volume
Issue
ISSN
13
5
2041-2649
Citations 
PageRank 
References 
4
0.51
5
Authors
5
Name
Order
Citations
PageRank
Eudes Barbosa140.85
Richard Röttger2234.97
Anne-Christin Hauschild341.52
Vasco Azevedo4285.93
Jan Baumbach514822.11