Title
Hadoop-RINS - A Hadoop Accelerated Pipeline for Rapid Nonhuman Sequence Identification.
Abstract
Sequencing data increase rapidly in recent years with the development of high-throughput sequencing technology. Using parallel computing to accelerate the computation is an important way to process the large volume of sequence data. RINS is a pipeline used to identify nonhuman sequences in deep sequencing datasets. It uses user-provided microbial reference genomes to reduce the number of reads to be processed and improve the processing speed. But all of its steps run serially. As a result, the processing speed of RINS slows down sharply as the sequencing data and reference genomes increase. In this article, we report a pipeline that processes sequencing data parallel through Hadoop. By comparing the runtime using same dataset, Hadoop-RINS is proved to be significantly faster than RINS with the same computation result.
Year
DOI
Venue
2013
null
BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS
Keywords
Field
DocType
High-Throughput Sequencing,Metagenomics,RINS,Hadoop,MapReduce
Data science,Computer science,Artificial intelligence,Machine learning
Conference
Volume
Issue
Citations 
null
null
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Jiangyu Li122.13
Yang Liu21568126.97
Xiaolei Wang3162.94
Yiqing Mao400.68
Yumin Wang500.34
Dongsheng Zhao6646.88