Abstract | ||
---|---|---|
RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20-25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1093/bioinformatics/btp692 | Bioinformatics |
Keywords | Field | DocType |
complicating expression analysis,maize rna-seq data,inference method,gene-level expression estimation,previous method,read length,expression estimation,gene expression level,rna-seq gene expression estimation,single read,isoform expression level,genome,gene expression profiling,statistical model,gene expression,algorithms,computational biology | Genome,Data mining,RNA-Seq,Computer science,Inference,Gene mapping,DNA sequencing,Statistical model,Bioinformatics,Reference genome,Gene expression profiling | Journal |
Volume | Issue | ISSN |
26 | 4 | 1367-4811 |
Citations | PageRank | References |
84 | 7.00 | 4 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bo Li | 1 | 578 | 45.93 |
Victor Ruotti | 2 | 123 | 15.30 |
Ron M Stewart | 3 | 121 | 10.18 |
James A Thomson | 4 | 140 | 19.20 |
Colin N. Dewey | 5 | 386 | 24.24 |