Title
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
Abstract
Data intensive computing can be defined as computation involving large datasets and complicated I/O patterns. Data intensive computing is challenging because there is a five-orders-of-magnitude latency gap between main memory DRAM and spinning hard disks; the result is that an inordinate amount of time in data intensive computing is spent accessing data on disk. To address this problem we designed and built a prototype data intensive supercomputer named DASH that exploits flash-based Solid State Drive (SSD) technology and also virtually aggregated DRAM to fill the latency gap . DASH uses commodity parts including Intel® X25-E flash drives and distributed shared memory (DSM) software from ScaleMP®. The system is highly competitive with several commercial offerings by several metrics including achieved IOPS (input output operations per second), IOPS per dollar of system acquisition cost, IOPS per watt during operation, and IOPS per gigabyte (GB) of available storage. We present here an overview of the design of DASH, an analysis of its cost efficiency, then a detailed recipe for how we designed and tuned it for high data-performance, lastly show that running data-intensive scientific applications from graph theory, biology, and astronomy, we achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures.
Year
DOI
Venue
2010
10.1109/SC.2010.16
SC
Keywords
Field
DocType
prototype data intensive supercomputer,spinning hard disks,system acquisition cost,dash,data intensive computing,i/o patterns,main memory,solid state drive,distributed shared memory software,dram chips,ssd technology,scalemp,latency gap,cost efficiency,memory dram,flash-based solid state drive technology,o pattern,graph theory,x25-e flash drives,flash-based data intensive supercomputer,parallel machines,hard discs,five-orders-of-magnitude latency gap,aggregated dram,dsm software,tuning,distributed shared memory,measurement,input output,memory management,spinning
Data-intensive computing,Supercomputer,IOPS,Computer science,Gigabyte,Parallel computing,Memory management,Solid-state drive,Distributed shared memory,Operating system,Speedup,Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-4244-7558-2
32
2.11
References 
Authors
9
5
Name
Order
Citations
PageRank
Jiahua He11499.28
Arun Jagatheesan217214.90
Sandeep Gupta3799.00
Jeffrey Bennett4759.82
allan snavely a5117770.79