Title
Estimating population diversity with unreliable low frequency counts.
Abstract
We consider the classical population diversity estimation scenario based on frequency count data (the number of classes or taxa represented once, twice, etc. in the sample), but with the proviso that the lowest frequency counts, especially the singletons, may not be reliably observed. This arises especially in data derived from modern high-throughput DNA sequencing, where errors may cause sequences to be incorrectly assigned to new taxa instead of being matched to existing, observed taxa. We look at a spectrum of methods for addressing this issue, focusing in particular on fitting a parametric mixture model and deleting the highest-diversity component; we also consider regarding the data as left-censored and effectively pooling two or more low frequency counts. We find that these purely statistical "downstream" corrections will depend strongly on their underlying assumptions, but that such methods can be useful nonetheless.
Year
Venue
Keywords
2012
Biocomputing-Pacific Symposium on Biocomputing
microbial diversity,mixture model,species problem,capture-recapture,left-censored data
Field
DocType
ISSN
Population diversity,Open peer review,Physiology,Statistics,Medicine
Conference
2335-6936
Citations 
PageRank 
References 
1
0.47
0
Authors
4
Name
Order
Citations
PageRank
John Bunge182.89
Dankmar Böhning210.81
Heather K Allen330.95
James A. Foster435361.38