Title
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing
Abstract
Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets named GrayWulf. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahl's Laws. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. The paper covers its architecture and reference applications. The software design is presented in a companion paper.
Year
DOI
Venue
2009
10.1109/HICSS.2009.234
HICSS
Keywords
Field
DocType
data intensive computing,software design
Byte,Data-intensive computing,Supercomputer,Computer science,Petabyte,Amdahl's law,Parallel computing,Petascale computing,Instruction cycle,Operating system,Scalability
Conference
Citations 
PageRank 
References 
27
1.93
10
Authors
12
Name
Order
Citations
PageRank
Alexander S. Szalay1959105.36
Gordon Bell2272.27
Jan Vandenberg325532.25
Alainna Wonders4272.27
Randal Burns51955115.15
Dan Fay6726.84
Jim Heasley7271.93
Anthony J. G. Hey828740.29
María A. Nieto-santisteban99811.03
Ani Thakar1033048.74
Catharine Van Ingen1121021.45
Richard Wilton12282.62