Abstract | ||
---|---|---|
DNA motif discovery is an important problem for deciphering gene regulation. Motifs usually contain gaps (spaced) and are more complex than contiguously conserved (monad) patterns. Existing algorithms mostly address monad motifs, and methods for spaced motifs impose various constraints on gaps, which may affect the discovery of complex motifs. In this paper, we propose Genetic Algorithm (GA) for Spaced Motifs Elicitation on Nucleotides (GASMEN), which searches from a wide range of possible widths (4-25) and relaxes substantial constraints. GASMEN employs submotif indexing to partition the search space into smaller sub-space for GA to easier reach optimality. Multiple-motif control is employed and probabilistic refinements are proposed to improve motif quality respectively. The preliminary results on real spaced motifs demonstrate that GASMEN is promising to find more accurate motifs and optimal widths, compared with the state-of-the-art method, SPACE. GASMEN is also capable of finding monad motifs, outperforming both Weeder and SPACE on most of the 8 real datasets. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/CEC.2010.5585924 | 2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC) |
Keywords | Field | DocType |
hamming distance,genetic algorithm,pulse width modulation,nucleotides,gene regulation,indexation,dna,probabilistic logic,search space,genetic algorithms,genetics | Computer science,Sequence motif,Search engine indexing,Theoretical computer science,Hamming distance,Artificial intelligence,Aerospace electronics,Probabilistic logic,Partition (number theory),Machine learning,Genetic algorithm,Monad (functional programming) | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tak-ming Chan | 1 | 190 | 13.57 |
Kwong-Sak Leung | 2 | 1887 | 205.58 |
Kin-Hong Lee | 3 | 257 | 26.27 |
Pietro Liò | 4 | 550 | 99.98 |