Common substrings in random strings - Citegraph

Paper Info

Title
Common substrings in random strings

Abstract
In computational biology, an important problem is to identify a word of length k present in each of a given set of sequences. Here, we investigate the problem of calculating the probability that such a word exists in a set of r random strings. Existing methods to approximate this probability are either inaccurate when r 2 or are restricted to Bernoulli models. We introduce two new methods for computing this probability under Bernoulli and Markov models. We present generalizations of the methods to compute the probability of finding a word of length k shared among q of r sequences, and to allow mismatches. We show through simulations that our approximations are significantly more accurate than methods previously published.

Year	DOI	Venue
2006	10.1007/11780441_13	CPM
Keywords	Field	DocType
common substrings,markov model,length k present,important problem,present generalization,r random string,new method,computational biology,bernoulli model,r sequence,length k	Discrete mathematics,Combinatorics,Substring,Markov model,Generalization,Bernoulli process,String (computer science),Mathematics,Bernoulli's principle	Conference
Volume	ISSN	ISBN
4009	0302-9743	3-540-35455-7
Citations	PageRank	References
0	0.34	7
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Eric Blais	1	286	22.49
Mathieu Blanchette	2	631	62.65

1