Title
The duplication-loss small phylogeny problem: from cherries to trees.
Abstract
The reconstruction of the history of evolutionary genome-wide events among a set of related organisms is of great biological interest since it can help to reveal the genomic basis of phenotypes. The sequencing of whole genomes faciliates the study of gene families that vary in size through duplication and loss events, like transfer RNA. However, a high sequence similarity often does not allow one to distinguish between orthologs and paralogs. Previous methods have addressed this difficulty by taking into account flanking regions of members of a family independently. We go one step further by inferring the order of genes of (a set of) families for ancestral genomes by considering the order of these genes on sequenced genomes. We present a novel branch-and-cut algorithm to solve the two species small phylogeny problem in the evolutionary model of duplications and losses. On average, our implementation, DupLoCut, improves the running time of a recently proposed method in the experiments on six Vibrionaceae lineages by a factor of approximate to 200. Besides the mere improvement in running time, the efficiency of our approach allows us to extend our model from cherries of a species tree, that is, subtrees with two leaves, to the median of three species setting. Being able to determine the median of three species is of key importance to one of the most common approaches to ancestral reconstruction, and our experiments show that its repeated computation considerably reduces the number of duplications and losses along the tree both on simulated instances comprising 128 leaves and a set of Bacillus genomes. Furthermore, in our simulations we show that a reduction in cost goes hand in hand with an improvement of the predicted ancestral genomes. Finally, we prove that the small phylogeny problem in the duplication-loss model is NP-complete already for two species.
Year
DOI
Venue
2013
10.1089/cmb.2013.0057
JOURNAL OF COMPUTATIONAL BIOLOGY
Field
DocType
Volume
Genome,Gene,Biology,Transfer RNA,Homology (biology),Bioinformatics,Gene duplication,Phylogenetics,Gene family,Bacterial genome size
Journal
20.0
Issue
ISSN
Citations 
9
1066-5277
1
PageRank 
References 
Authors
0.37
4
3
Name
Order
Citations
PageRank
Sandro Andreotti191.93
Knut Reinert21020105.87
Stefan Canzar311614.05