Title
The impact of normalization and phylogenetic information on estimating the distance for metagenomes.
Abstract
Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.
Year
DOI
Venue
2012
10.1109/TCBB.2011.111
IEEE/ACM Trans. Comput. Biology Bioinform.
Keywords
Field
DocType
well-known distance function,suitable normalization,distance function,rank-based normalization,phylogenetic information,synthetic data set,cluster sample,synthetic microbiomes,challenging problem,reliable clustering result,synthetic data,genomics,phylogeny,accuracy,microorganisms,clustering,cluster sampling,metagenomics,computational biology,reliability,bioinformatics,normalization,correlation,genetics
Phylogenetic tree,Normalization (statistics),Computer science,Metagenomics,Genomics,Correlation,Bioinformatics,Cluster sampling,Phylogenetics,Cluster analysis
Journal
Volume
Issue
ISSN
9
2
1557-9964
Citations 
PageRank 
References 
1
0.37
1
Authors
7
Name
Order
Citations
PageRank
Chien-Hao Su1272.27
Tse-Yi Wang2816.39
Ming-Tsung Hsu3344.10
Francis Cheng-Hsuan Weng4161.28
Cheng-yan Kao558661.50
Daryi Wang6181.76
Huai-Kuang Tsai713214.33