MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes. - Citegraph

Paper Info

Title
MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes.

Abstract
A With the advent of next-generation sequencing, traditional bioinformatics tools are challenged by massive raw metagenomic datasets. One of the bottlenecks of metagenomic studies is lack of large-scale and cloud computing suitable data analysis tools. In this paper, we proposed a Spark-based tool, called MetaSpark, to recruit metagenomic reads to reference genomes. MetaSpark benefits from the distributed data set (RDD) of Spark, which makes it able to cache data set in memory across cluster nodes and scale well with the datasets. Compared with previous metagenomics recruitment tools, MetaSpark recruited significantly more reads than many programs such as SOAP2, BWA and LAST and increased recruited reads by similar to 4% compared with FRHIT when there were 1 million reads and 0.75GB references. Different test cases demonstrate MetaSpark's scalability and overall high performance.

Year	DOI	Venue
2017	10.1093/bioinformatics/btw750	BIOINFORMATICS
Field	DocType	Volume
Genome,Spark (mathematics),Computer science,Metagenomics,Bioinformatics	Journal	33
Issue	ISSN	Citations
7	1367-4803	9
PageRank	References	Authors
0.67	5	5

Authors (5 rows)

Cited by (9 rows)

References (5 rows)

Name	Order	Citations	PageRank
Wei Zhou	1	9	1.01
Ruilin Li	2	16	7.90
Shuo Yuan	3	9	0.67
ChangChun Liu	4	9	0.67
Shaowen Yao	5	86	26.85

1