Hadoop-RINS - A Hadoop Accelerated Pipeline for Rapid Nonhuman Sequence Identification. - Citegraph

Paper Info

Title
Hadoop-RINS - A Hadoop Accelerated Pipeline for Rapid Nonhuman Sequence Identification.

Abstract
Sequencing data increase rapidly in recent years with the development of high-throughput sequencing technology. Using parallel computing to accelerate the computation is an important way to process the large volume of sequence data. RINS is a pipeline used to identify nonhuman sequences in deep sequencing datasets. It uses user-provided microbial reference genomes to reduce the number of reads to be processed and improve the processing speed. But all of its steps run serially. As a result, the processing speed of RINS slows down sharply as the sequencing data and reference genomes increase. In this article, we report a pipeline that processes sequencing data parallel through Hadoop. By comparing the runtime using same dataset, Hadoop-RINS is proved to be significantly faster than RINS with the same computation result.

Year	DOI	Venue
2013	null	BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS
Keywords	Field	DocType
High-Throughput Sequencing,Metagenomics,RINS,Hadoop,MapReduce	Data science,Computer science,Artificial intelligence,Machine learning	Conference
Volume	Issue	Citations
null	null	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jiangyu Li	1	2	2.13
Yang Liu	2	1568	126.97
Xiaolei Wang	3	16	2.94
Yiqing Mao	4	0	0.68
Yumin Wang	5	0	0.34
Dongsheng Zhao	6	64	6.88

1