Title
Derived distribution points heuristic for fast pairwise statistical significance estimation
Abstract
Estimation of statistical significance of a pairwise sequence alignment is crucial in homology detection. A recent development in the field is the use of pairwise statistical significance as an alternative to database statistical significance. Although pairwise statistical significance has been shown to be potentially better than database statistical significance in terms of homology detection retrieval accuracy, currently it is much time consuming since it involves generating an empirical score distribution by aligning one sequence of the sequence-pair with N random shuffles of the other sequence. A high value of N produces (statistically and potentially biologically) accurate estimates, but also consumes more time. A low value of N leads to inaccurate fitting of the score distribution, and hence poor estimates of statistical significance. In this paper, we propose a simple heuristic, called the Derived Distribution Points (DDP) heuristic, which is designed taking into account the features of the pairwise statistical significance estimation procedure, and has shown to significantly improve the quality of pairwise statistical significance estimates (evaluated in terms of retrieval accuracy) even when using low values of N. Alternatively, it can be thought of as speeding-up pairwise statistical significance estimation using high values of N, where comparable performance is achieved by actually using a much lower number of random shuffles. Experiments indicate that a speed-up of up to 40 as compared to current implementations can be achieved without loss in retrieval accuracy.
Year
DOI
Venue
2010
10.1145/1854776.1854819
BCB
Keywords
Field
DocType
derived distribution points heuristic,retrieval accuracy,pairwise statistical significance estimation,database statistical significance,pairwise statistical significance estimate,low value,pairwise statistical significance,high value,fast pairwise statistical significance,speeding-up pairwise statistical significance,statistical significance,pairwise sequence alignment,longest increasing subsequence,multiple alignment,sequence alignment
Pairwise comparison,Heuristic,Longest increasing subsequence,Pairwise sequence alignment,Statistical significance,Multiple sequence alignment,Statistics,Mathematics
Conference
Citations 
PageRank 
References 
3
0.38
15
Authors
3
Name
Order
Citations
PageRank
Ankit Agrawal160759.22
Alok N. Choudhary224222.44
Xiaoqiu Huang338358.13