Title
New Algorithms for Computing Phylogenetic Biodiversity.
Abstract
A common problem that appears in many case studies in ecology is the following: given a rooted phylogenetic tree T and a subset R of its leaf nodes, we want to compute the distance between the elements in R. A very popular distance measure that can be used for this reason is the Phylogenetic Diversity (PD), which is defined as the cost of the minimum weight Steiner tree in T that spans the nodes in R. To analyse the value of the PD for a given set R it is important also to calculate the variance of this measure. However, the best algorithm known so far for computing the variance of the PD is inefficient; for any input tree T that consists of n nodes, this algorithm has 0(n2) running time. Moreover, computing efficiently the variance and higher order statistical moments is a major open problem for several other phylogenetic measures. We provide the following results: We describe a new algorithm that computes efficiently in practice the variance of the PD. This algorithm has 0(SI(T)+D55I2(T)) running time; here SI(T) denotes the Sackin's Index of T, and DSSI(T) is a new index whose value depends on how balanced T is. We provide for the first time exact formulas for computing the mean and the variance of another popular biodiversity measure, the Mean Nearest Taxon Distance (MNTD). These formulas apply specifically to ultrametric trees. For an ultrametric tree T of n nodes, we show how we can compute the mean of the MNTD in O(n) time, and its variance in O(SI(T) DSSI2(T)) time. We introduce a new measure which we call the Core Ancestor Cost (CAC). A major advantage of this measure is that for any integer k > 0 we can compute all first k statistical moments of the CAC in 0(SI(T) nk+k(2)) time in total, using O(n+ k) space. We have implemented the new algorithms for computing the variance of the PD and of the MNTD, and the statistical moments of the CAC. We conducted experiments on large phylogenetic datasets and we show that our algorithms perform efficiently in practice.
Year
DOI
Venue
2014
10.1007/978-3-662-44753-6_15
ALGORITHMS IN BIOINFORMATICS
Field
DocType
Volume
Biodiversity,Combinatorics,Open problem,Phylogenetic diversity,Phylogenetic tree,Steiner tree problem,Computer science,Algorithm,Minimum weight,Bioinformatics,Method of moments (statistics)
Conference
8701
ISSN
Citations 
PageRank 
0302-9743
1
0.63
References 
Authors
3
3
Name
Order
Citations
PageRank
Constantinos Tsirogiannis1125.93
Brody Sandel242.21
Adrija Kalvisa310.63