Title
EXTREME: an online EM algorithm for motif discovery.
Abstract
Motivation: Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery-the task of identifying the sequence preference of transcription factor proteins, which bind to these elements-is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME's running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences. Results: We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional.
Year
DOI
Venue
2014
10.1093/bioinformatics/btu093
BIOINFORMATICS
Field
DocType
Volume
Data mining,Source code,Computer science,Expectation–maximization algorithm,Sequence motif,Motif (music),Nucleotide Motif,Bioinformatics,Multiple EM for Motif Elicitation
Journal
30
Issue
ISSN
Citations 
12
1367-4803
3
PageRank 
References 
Authors
0.42
11
2
Name
Order
Citations
PageRank
Daniel Quang1473.23
Xiaohui Xie2615.50