Title
A scalable data analysis platform for metagenomics
Abstract
With the advent of high-throughput DNA sequencing technology, the analysis and management of the increasing amount of biological sequence data has become a bottleneck for scientific progress. For example, MG-RAST, a metagenome annotation system serving a large scientific community worldwide, has experienced a sustained, exponential growth in data submissions for several years; and this trend is expected to continue. To address the computational challenges posed by this workload, we developed a new data analysis platform, including a data management system (Shock) for biological sequence data and a workflow management system (AWE) supporting scalable, fault-tolerant task and resource management. Shock and AWE can be used to build a scalable and reproducible data analysis infrastructure for upper-level biological data analysis services.
Year
DOI
Venue
2013
10.1109/BigData.2013.6691723
BigData Conference
Keywords
Field
DocType
metagenomics,workflow,workflow management system,shock,data submissions,genomics,mg-rast,data analysis,scientific progress,upper-level biological data analysis services,biology computing,data analysis platform,biological sequence data,high-throughput dna sequencing technology,awe,cloud computing,dna,bioinformatics,scalable data analysis platform,data management system,metagenome annotation system
Resource management,Data science,Data mining,Biological data,Bottleneck,Computer science,Data management,Workflow management system,Workflow,Scalability,Cloud computing
Conference
ISSN
Citations 
PageRank 
2639-1589
16
1.06
References 
Authors
12
6
Name
Order
Citations
PageRank
Wei Tang1442.48
Jared Wilkening2483.77
Narayan Desai331929.73
Wolfgang Gerlach4817.03
Andreas Wilke531423.84
Folker Meyer648451.83