Title
Expectation of Strings with Mismatches under Markov Chain Distribution
Abstract
We study a problem related to the extraction of over-represented words from a given source text x , of length n . The words are allowed to occur with k mismatches, and x is produced by a source over an alphabet Σ according to a Markov chain of order p . We propose an online algorithm to compute the expected number of occurrences of a word y of length m in O (mk |Σ| p + 1). We also propose an offline algorithm to compute the probability of any word that occurs in the text in O (k |Σ|2) after O (nk |Σ| p + 1) pre-processing. This algorithm allows us to compute the expectation for all the words in a text of length n in O (kn 2|Σ|2 + nk |Σ| p + 1), rather than in O (n 3 |Σ| p + 1) that can be obtained with other methods. Although this study was motivated by the motif discovery problem in bioinformatics, the results find their applications in any other domain involving combinatorics on words.
Year
DOI
Venue
2009
10.1007/978-3-642-03784-9_22
SPIRE
Keywords
Field
DocType
offline algorithm,motif discovery problem,over-represented word,order p,online algorithm,source text,length n,k mismatches,markov chain distribution,word y,length m,combinatorics on words,markov chain
Online algorithm,Combinatorics,Markov chain,Expected value,Combinatorics on words,Mathematics,Alphabet
Conference
Volume
ISSN
Citations 
5721
0302-9743
3
PageRank 
References 
Authors
0.42
7
2
Name
Order
Citations
PageRank
Cinzia Pizzi113915.73
Mauro Bianco2938.10