Title
Benchmarking SciDB data import on HPC systems
Abstract
SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming.
Year
DOI
Venue
2016
10.1109/HPEC.2016.7761617
2016 IEEE High Performance Extreme Computing Conference (HPEC)
Keywords
DocType
Volume
benchmarking SciDB data import,scalable computational database management system,SciDB array data model,imaging data management,imaging data storage,database advanced analytics,data analysis,commodity hardware,high performance computing environment,HPC systems,dynamic distributed dimensional data model,D4M software,cluster benchmark,MIT SuperCloud software stack,parallel inserts,in-database array merging,supercomputing techniques,distributed arrays,single-program-multiple-data programming
Conference
abs/1609.07545
ISSN
ISBN
Citations 
2377-6943
978-1-5090-3526-7
6
PageRank 
References 
Authors
0.48
5
19
Name
Order
Citations
PageRank
Siddharth Samsi120124.09
Laura J. Brattain2276.64
William Arcand317517.77
David Bestor418119.08
Bill Bergeron516816.57
Chansup Byun618019.21
Vijay Gadepally744950.53
Michael Houle86310.42
Matthew Hubbell919220.93
Michael J. Jones1011341927.21
Anna Klein114910.10
Peter Michaleas1220120.93
Lauren Milechin1310216.45
Julie Mullen1413815.22
Andrew Prout1518218.78
Antonio Rosa1617017.67
Charles Yee1714715.14
Jeremy Kepner1860661.58
Albert Reuther1933537.32