Title
Challenges and approaches for distributed workflow-driven analysis of large-scale biological data: vision paper
Abstract
Next-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges and placing unprecedented demands on traditional single-processor bioinformatics algorithms. Middleware and technologies for scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific workflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This vision paper discusses the challenges related to next-generation sequencing data, explains the approaches taken in bioKepler to help with analysis of such data, and presents preliminary results demonstrating these approaches.
Year
DOI
Venue
2012
10.1145/2320765.2320791
EDBT/ICDT Workshops
Keywords
Field
DocType
kepler workflows,scientific workflows,next-generation dna sequencing machine,scientific challenge,next-generation sequence data,kepler scientific workflow system,sequence data,next-generation sequencing data,large-scale biological data,traditional single-processor bioinformatics algorithm,vision paper,bioinformatics application,workflow-driven analysis,next generation sequencing,data intensive computing,bioinformatics,biological data,dna sequence,distributed environment,application
Middleware,Data science,Biological data,Kepler scientific workflow system,Computer science,Data sequences,Kepler,Workflow
Conference
Citations 
PageRank 
References 
8
0.71
22
Authors
4
Name
Order
Citations
PageRank
Ilkay Altintas11191106.09
Jianwu Wang2865.21
Daniel Crawl324321.02
Weizhong Li495164.65