Title
A powerful and flexible approach to the analysis of RNA sequence count data.
Abstract
A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean-variance relationships provides a flexible testing regimen that 'borrows' information across genes, while easily incorporating design effects and additional covariates.We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean-variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility.An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeqyzhou@bios.unc.edu; fwright@bios.unc.eduSupplementary data are available at Bioinformatics online.
Year
DOI
Venue
2011
10.1093/bioinformatics/btr449
Bioinformatics
Keywords
Field
DocType
rna-seq data,microarray gene expression data,supplementary data,linear model,variance modeling approach,continuous covariates,flexible approach,rna sequence count data,additional covariates,explicit modeling,bbseq contact,supplementary information,gene expression profiling,transcriptome,gene expression
Data mining,Anomaly detection,Overdispersion,Covariate,RNA Sequence,Computer science,Generalized linear model,Software,Count data,Bioinformatics,Sample size determination
Journal
Volume
Issue
ISSN
27
19
1367-4811
Citations 
PageRank 
References 
18
1.62
4
Authors
3
Name
Order
Citations
PageRank
Yihui Zhou1346.71
Kai Xia2343.33
Fred A Wright3525.42