Title
Accelerating parallel analysis of scientific simulation data via Zazen
Abstract
As a new generation of parallel supercomputers enables researchers to conduct scientific simulations of unprecedented scale and resolution, terabyte-scale simulation output has become increasingly commonplace. Analysis of such massive data sets is typically I/O-bound: many parallel analysis programs spend most of their execution time reading data from disk rather than performing useful computation. To overcome this I/O bottleneck, we have developed a new data access method. Our main idea is to cache a copy of simulation output files on the local disks of an analysis cluster's compute nodes, and to use a novel task-assignment protocol to co-locate data access with computation. We have implemented our methodology in a parallel disk cache system called Zazen. By avoiding the overhead associated with querying metadata servers and by reading data in parallel from local disks, Zazen is able to deliver a sustained read bandwidth of over 20 gigabytes per second on a commodity Linux cluster with 100 nodes, approaching the optimal aggregated I/O bandwidth attainable on these nodes. Compared with conventional NFS, PVFS2, and Hadoop/HDFS, respectively, Zazen is 75, 18, and 6 times faster for accessing large (1-GB) files, and 25, 13, and 85 times faster for accessing small (2-MB) files. We have deployed Zazen in conjunction with Anton--a special-purpose supercomputer that dramatically accelerates molecular dynamics (MD) simulations-- and have been able to accelerate the parallel analysis of terabyte-scale MD trajectories by about an order of magnitude.
Year
Venue
Keywords
2010
FAST
scientific simulation data,parallel disk cache system,new data access method,local disk,analysis cluster,parallel supercomputers,parallel analysis program,data access,massive data set,execution time reading data,parallel analysis,molecular dynamic,linux cluster
Field
DocType
Citations 
Bottleneck,Disk buffer,Supercomputer,Cache,Computer science,Parallel computing,Server,Real-time computing,Bandwidth (signal processing),Data access,Computer cluster
Conference
12
PageRank 
References 
Authors
0.78
31
6
Name
Order
Citations
PageRank
Tiankai Tu119314.17
Charles A. Rendleman2412.81
Patrick J. Miller3413.14
Federico D. Sacerdoti49811.74
Ron O. Dror543940.56
David Elliot Shaw6890139.33