Title
Designing filters for fast-known NcRNA identification.
Abstract
Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSPbased filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSPbased filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.
Year
DOI
Venue
2012
10.1109/TCBB.2011.149
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Keywords
Field
DocType
fast-known rna identification,web sites,high sensitivity,noncoding rna families,website,unfiltered cm search,fast-known ncrna identification,genomic dna,sequence annotation,ncrna instance,ssp-based filters,covariance analysis,dna,ssp filter,genomics,genome-wide search,algorithms for data and knowledge,soil metagenomic data set,various ncrna family,ncrna search tool,ncrna family,rna,physiological models,biology computing,molecular biophysics,designing filters,filters,ssp-based filter,secondary structure information,covariance model,multiple secondary structure profiles,feature extraction or construction,bioinformatics (genome or protein),multiple ssp match,formal languages.,algorithm design and analysis,feature extraction,formal languages,bioinformatics,formal language,secondary structure,hidden markov models,dynamic programming,position weight matrix,sensitivity,noncoding rna
Dynamic programming,Data mining,Rfam,Algorithm design,Computer science,Position weight matrix,Genomics,Metagenomics,Bioinformatics,Hidden Markov model,Speedup
Journal
Volume
Issue
ISSN
9
3
1545-5963
Citations 
PageRank 
References 
2
0.40
18
Authors
3
Name
Order
Citations
PageRank
Yanni Sun121921.16
Jeremy Buhler282193.45
Cheng Yuan370.75