Title
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.
Abstract
The position weight matrix (PWM) is a popular method to model transcription factor binding sites. A fundamental problem in cis-regulatory analysis is to "count" the occurrences of a PWM in a DNA sequence. We propose a novel probabilistic score to solve this problem of counting PWM occurrences. The proposed score has two important properties: (1) It gives appropriate weights to both strong and weak occurrences of the PWM, without using thresholds. (2) For any given PWM, this score can be computed while allowing for occurrences of other, a priori known PWMs, in a statistically sound framework. Additionally, the score is efficiently differentiable with respect to the PWM parameters, which has important consequences for designing search algorithms. The second problem we address is to find, ab initio, PWMs that have high counts in one set of sequences, and low counts in another. We develop a novel algorithm to solve this "discriminative motif-finding problem", using the proposed score for counting a PWM in the sequences. The algorithm is a local search technique that exploits derivative information on an objective function to enhance speed and performance. It is extensively tested on synthetic data, and shown to perform better than other discriminative as well as non-discriminative PWM finding algorithms. It is then applied to cis-regulatory modules involved in development of the fruitfly embryo, to elicit known and novel motifs. We finally use the algorithm on genes predictive of social behavior in the honey bee, and find interesting motifs.The program is available upon request from the author.
Year
DOI
Venue
2006
10.1093/bioinformatics/btl227
ISMB (Supplement of Bioinformatics)
Keywords
Field
DocType
novel algorithm,discriminative motif-finding problem,non-discriminative pwm,fundamental problem,novel motif,pwm occurrence,pwm parameter,search algorithm,novel probabilistic score,position weight matrix,proposed score,motif finding,local search,gene prediction,objective function,embryos,transcription factor binding site,cis regulatory module,synthetic data,social behavior,dna sequence
Search algorithm,Computer science,Position weight matrix,A priori and a posteriori,Synthetic data,Differentiable function,Bioinformatics,Probabilistic logic,Local search (optimization),Discriminative model
Conference
Volume
Issue
ISSN
22
14
1367-4811
Citations 
PageRank 
References 
22
1.31
10
Authors
1
Name
Order
Citations
PageRank
Saurabh Sinha152948.96