Title
SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.
Abstract
Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent nextgeneration sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of nextgeneration sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 50-end processing and 30-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502.
Year
DOI
Venue
2016
10.1093/bioinformatics/btw273
BIOINFORMATICS
Field
DocType
Volume
RNA,Deep sequencing,Small RNA,Computer science,Small nucleolar RNA,Algorithm,Accession number (library science),Bioinformatics,Cluster analysis,Non-coding RNA,Reference genome
Journal
32
Issue
ISSN
Citations 
12
1367-4803
1
PageRank 
References 
Authors
0.37
10
7
Name
Order
Citations
PageRank
Mariko Tsuchiya110.37
Kojiro Amano210.37
Masaya Abe310.37
Misato Seki410.37
Sumitaka Hase510.37
Kengo Sato639222.46
Yasubumi Sakakibara776962.91