Title | ||
---|---|---|
IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) |
Abstract | ||
---|---|---|
The new second generation sequencing technology revolutionizes many biology related research fields, and posts various computational
biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length
mRNA transcripts simultaneously from millions of short reads. In this paper, we consider three objectives in transcriptome
assembly: the maximization of prediction accuracy, minimization of interpretation, and maximization of completeness. The first objective, the maximization of prediction accuracy, requires that the estimated expression levels based on assembled
transcripts should be as close as possible to the observed ones for every expressed region of the genome. The minimization
of interpretation follows the parsimony principle to seek as few transcripts in the prediction as possible. The third objective,
the maximization of completeness, requires that the maximum number of mapped reads (or “expressed segments” in gene models)
be explained by (i.e., contained in) the predicted transcripts in the solution. Based on the above three objectives, we present IsoLasso, a new
RNA-Seq based transcriptome assembly tool. IsoLasso is based on the well-known LASSO algorithm, a multivariate regression
method designated to seek a balance between the maximization of prediction accuracy and the minimization of interpretation.
By including some additional constraints in the quadratic program involved in LASSO, IsoLasso is able to make the set of assembled
transcripts as complete as possible. Experiments on simulated and real RNA-Seq datasets show that IsoLasso achieves higher
sensitivity and precision simultaneously than the state-of-art transcript assembly tools.
|
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-20036-6_18 | RECOMB |
Keywords | Field | DocType |
multivariate regression,quadratic program,computational biology | Data mining,Biology,Regression,RNA-Seq,Transcriptome,Lasso (statistics),Minification,Bioinformatics,Quadratic programming,Completeness (statistics),Maximization | Conference |
ISSN | Citations | PageRank |
16113349 | 5 | 0.63 |
References | Authors | |
11 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Wei Li | 1 | 50 | 4.38 |
Jianxing Feng | 2 | 92 | 9.02 |
Tao Jiang | 3 | 1809 | 155.32 |