Fast algorithms for computing sequence distances by exhaustive substring composition. - Citegraph

Paper Info

Title
Fast algorithms for computing sequence distances by exhaustive substring composition.

Abstract
The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background.

Year	DOI	Venue
2008	10.1186/1748-7188-3-13	Algorithms for Molecular Biology
Keywords	Field	DocType
bioinformatics,algorithms,biomedical research,phylogenetic tree,sequence analysis	Substring,Computational problem,Phylogenetic tree,Suffix,Computer science,Algorithm,Theoretical computer science,Throughput,Bioinformatics,Suffix tree,Fraction (mathematics),Sequence analysis	Journal
Volume	Issue	ISSN
3	1	1748-7188
Citations	PageRank	References
10	0.56	9
Authors
2

Authors (2 rows)

Cited by (10 rows)

References (9 rows)

Name	Order	Citations	PageRank
Alberto Apostolico	1	1441	182.20
Olgert Denas	2	11	1.25

1