Title
NASCUP: Nucleic Acid Sequence Classification by Universal Probability
Abstract
Motivated by the need for fast and accurate classification of unlabeled nucleotide sequences on a large scale, we propose a new classification method that captures the probabilistic structure of a sequence family as a compact context-tree model and uses it efficiently to test proximity and membership of a query sequence. The proposed nucleic acid sequence classification by universal probability (NASCUP) method crucially utilizes the notion of universal probability from information theory in model-building and classification processes, delivering BLAST-like accuracy in orders-of-magnitude reduced runtime for large-scale databases. A comprehensive experimental study involving seven public databases for functional non-coding RNA classification and microbial taxonomy classification demonstrates the advantages of NASCUP over widely-used alternatives in efficiency, accuracy, and scalability across all datasets considered. [availability: http://data.snu.ac.kr/nascup]
Year
Venue
Field
2015
CoRR
Information theory,Data mining,Anomaly detection,Computer science,Nucleic acid sequence,Bioinformatics
DocType
Volume
Citations 
Journal
abs/1511.04944
0
PageRank 
References 
Authors
0.34
1
5
Name
Order
Citations
PageRank
Sunyoung Kwon192.31
gyuwan kim212.04
Byunghan Lee31107.98
Sungroh Yoon456678.80
Young-Han Kim531848.11