Title
Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency
Abstract
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length- 500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
Year
Venue
Keywords
2008
APBC
matrix representation,regular expression,dna sequence,gene expression,branch and bound algorithm,np hard problem,nucleotides,binding site
Field
DocType
Citations 
Branch and bound,Regular expression,Binding site,Matrix (mathematics),Algorithm,Motif (music),DNA,DNA sequencing,Bioinformatics,Hidden Markov model,Mathematics
Conference
1
PageRank 
References 
Authors
0.38
19
4
Name
Order
Citations
PageRank
Francis Y. L. Chin12173377.15
Henry C. M. Leung250336.59
Manhung Siu346461.40
Siu-ming Yiu4358.28