Abstract | ||
---|---|---|
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (many-task computing, MTC) applications suffer poor performance and utilization on large parallel computers because of the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time to solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores; decreases the time to solution of a seismology application, CyberShake, by 7.9% on 2,048 cores; and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/SC.2012.44 | SC |
Keywords | Field | DocType |
optimisation,parallel processing,dataflow pattern identification,parallel computers,scalable parallel scripting,astronomy computing,astronomy data analysis application,parallel scripting,scalable mtc data management,data management analysis,efficient data movement strategy,filesystem i/o,data movement strategies,data analysis,filesystem abstraction,original blast application,file organisation,data aware scheduler,seismology application,efficient large-scale parallel execution,large parallel computer,parallel stage,automatic optimization,data management system,fault detection,fault tolerance,signal analysis,data management | Signal processing,Data analysis,Fault detection and isolation,Computer science,Parallel computing,Dataflow,Fault tolerance,Data management,Scalability,Scripting language,Distributed computing | Conference |
ISSN | ISBN | Citations |
2167-4329 | 978-1-4673-0805-2 | 19 |
PageRank | References | Authors |
0.72 | 25 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhao Zhang | 1 | 19 | 0.72 |
Daniel S. Katz | 2 | 1496 | 121.04 |
Justin M. Wozniak | 3 | 464 | 35.32 |
Allan Espinosa | 4 | 76 | 3.65 |
Foster Ian | 5 | 22938 | 2663.24 |