Title
Using the nucleotide substitution rate matrix to detect horizontal gene transfer.
Abstract
Horizontal gene transfer (HGT) has allowed bacteria to evolve many new capabilities. Because transferred genes perform many medically important functions, such as conferring antibiotic resistance, improved detection of horizontally transferred genes from sequence data would be an important advance. Existing sequence-based methods for detecting HGT focus on changes in nucleotide composition or on differences between gene and genome phylogenies; these methods have high error rates.First, we introduce a new class of methods for detecting HGT based on the changes in nucleotide substitution rates that occur when a gene is transferred to a new organism. Our new methods discriminate simulated HGT events with an error rate up to 10 times lower than does GC content. Use of models that are not time-reversible is crucial for detecting HGT. Second, we show that using combinations of multiple predictors of HGT offers substantial improvements over using any single predictor, yielding as much as a factor of 18 improvement in performance (a maximum reduction in error rate from 38% to about 3%). Multiple predictors were combined by using the random forests machine learning algorithm to identify optimal classifiers that separate HGT from non-HGT trees.The new class of HGT-detection methods introduced here combines advantages of phylogenetic and compositional HGT-detection techniques. These new techniques offer order-of-magnitude improvements over compositional methods because they are better able to discriminate HGT from non-HGT trees under a wide range of simulated conditions. We also found that combining multiple measures of HGT is essential for detecting a wide range of HGT events. These novel indicators of horizontal transfer will be widely useful in detecting HGT events linked to the evolution of important bacterial traits, such as antibiotic resistance and pathogenicity.
Year
DOI
Venue
2006
10.1186/1471-2105-7-476
BMC Bioinformatics
Keywords
Field
DocType
machine learning,time reversal,bioinformatics,markov chains,random forest,horizontal gene transfer,computer simulation,error rate,phylogeny,algorithms,computational biology,microarrays,gc content,antibiotic resistance,horizontal transfer,nucleotides
Genome,Gene,Phylogenetic tree,Biology,Horizontal gene transfer,GC-content,Bioinformatics,Phylogenetics,Genetics,Random forest,DNA microarray
Journal
Volume
Issue
ISSN
7
1
1471-2105
Citations 
PageRank 
References 
24
0.63
5
Authors
3
Name
Order
Citations
PageRank
Micah Hamady11385.80
M D Betterton2240.63
Rob Knight336626.19