Title
Modeling Chip Sequencing In Silico With Applications
Abstract
ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.
Year
DOI
Venue
2008
10.1371/journal.pcbi.1000158
PLOS COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
computational biology,power law distribution,functional genomics,genome sequence,binding site,high throughput,statistical distributions,protein binding,dna binding proteins,computer simulation,chip,chromatin immunoprecipitation,cluster analysis,transcription factors,gamma distribution,genomics,transcription factor binding site,dna,binding sites
Deep sequencing,Biology,ChIP-sequencing,Functional genomics,Genomics,Probability distribution,DNA sequencing,Gamma distribution,Bioinformatics,Genetics,In silico
Journal
Volume
Issue
ISSN
4
8
1553-7358
Citations 
PageRank 
References 
12
1.87
3
Authors
5
Name
Order
Citations
PageRank
Zhengdong D. Zhang1426.96
Joel S. Rozowsky2122.54
Michael Snyder313826.15
Joseph T. Chang4121.87
Mark Gerstein523616.76