Title | ||
---|---|---|
Hadoop-RINS - A Hadoop Accelerated Pipeline for Rapid Nonhuman Sequence Identification. |
Abstract | ||
---|---|---|
Sequencing data increase rapidly in recent years with the development of high-throughput sequencing technology. Using parallel computing to accelerate the computation is an important way to process the large volume of sequence data. RINS is a pipeline used to identify nonhuman sequences in deep sequencing datasets. It uses user-provided microbial reference genomes to reduce the number of reads to be processed and improve the processing speed. But all of its steps run serially. As a result, the processing speed of RINS slows down sharply as the sequencing data and reference genomes increase. In this article, we report a pipeline that processes sequencing data parallel through Hadoop. By comparing the runtime using same dataset, Hadoop-RINS is proved to be significantly faster than RINS with the same computation result. |
Year | DOI | Venue |
---|---|---|
2013 | null | BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS |
Keywords | Field | DocType |
High-Throughput Sequencing,Metagenomics,RINS,Hadoop,MapReduce | Data science,Computer science,Artificial intelligence,Machine learning | Conference |
Volume | Issue | Citations |
null | null | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiangyu Li | 1 | 2 | 2.13 |
Yang Liu | 2 | 1568 | 126.97 |
Xiaolei Wang | 3 | 16 | 2.94 |
Yiqing Mao | 4 | 0 | 0.68 |
Yumin Wang | 5 | 0 | 0.34 |
Dongsheng Zhao | 6 | 64 | 6.88 |