Title
BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences.
Abstract
Motivation: De novo transcriptome assemblies of RNA-Seq data are important for genomics applications of unsequenced organisms. Owing to the complexity and often incomplete representation of transcripts in sequencing libraries, the assembly of high-quality transcriptomes can be challenging. However, with the rapidly growing number of sequenced genomes, it is now feasible to improve RNA-Seq assemblies by guiding them with genomic sequences. Results: This study introduces BRANCH, an algorithm designed for improving de novo transcriptome assemblies by using genomic information that can be partial or complete genome sequences from the same or a related organism. Its input includes assembled RNA reads (transfrags), genomic sequences (e.g. contigs) and the RNA reads themselves. It uses a customized version of BLAT to align the transfrags and RNA reads to the genomic sequences. After identifying exons from the alignments, it defines a directed acyclic graph and maps the transfrags to paths on the graph. It then joins and extends the transfrags by applying an algorithm that solves a combinatorial optimization problem, called the Minimum weight Minimum Path Cover with given Paths. In performance tests on real data from Caenorhabditis elegans and Saccharomyces cerevisiae, assisted by genomic contigs from the same species, BRANCH improved the sensitivity and precision of transfrags generated by Velvet/Oases or Trinity by 5.1-56.7% and 0.3-10.5%, respectively. These improvements added 3.8-74.1% complete transcripts and 8.3-3.8% proteins to the initial assembly. Similar improvements were achieved when guiding the BRANCH processing of a transcriptome assembly from a more complex organism (mouse) with genomic sequences from a related species (rat).
Year
DOI
Venue
2013
10.1093/bioinformatics/btt127
BIOINFORMATICS
Field
DocType
Volume
Genome,RNA,RNA-Seq,Computer science,Transcriptome,Directed acyclic graph,Genomics,Contig,Bioinformatics,Molecular Sequence Annotation
Journal
29
Issue
ISSN
Citations 
10
1367-4803
7
PageRank 
References 
Authors
0.62
11
3
Name
Order
Citations
PageRank
Ergude Bao1346.95
Tao Jiang21809155.32
Thomas Girke31369.39