Title
Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis.
Abstract
Motivation: De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering. Results: On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for processing de novo transcriptomic assemblies. Availability and implementation: The Grouper software is freely available at https://github.com/COMBINE-lab/grouper under the 2-clause BSD license.
Year
DOI
Venue
2018
10.1093/bioinformatics/bty378
BIOINFORMATICS
Field
DocType
Volume
Graph,Data mining,Text mining,Annotation,Computer science,Transcriptome,Grouper,Cluster analysis
Journal
34
Issue
ISSN
Citations 
19
1367-4803
0
PageRank 
References 
Authors
0.34
3
3
Name
Order
Citations
PageRank
Laraib Malik121.44
Fatemeh Almodaresi272.56
Rob Patro311112.98