Abstract | ||
---|---|---|
ABSTRACTDistributed resiliency becomes paramount to alleviate the growing costs of data movement and I/Os while preserving the data accuracy in HPC systems. This paper proposes to adopt blockchain-like decentralized protocols to achieve such distributed resiliency. The key challenge for such an adoption lies in the mismatch between blockchain's targeting systems (e.g., shared-nothing, loosely-coupled, TCP/IP stack) and HPC's unique design on storage subsystems, resource allocation, and programming models. We present BAASH, Blockchain-As-A-Service for HPC, deployable in a plug-n-play fashion. BAASH bridges the HPC-blockchain gap with two key components: (i) Lightweight consensus protocols for the HPC's shared-storage architecture, (ii) A new fault-tolerant mechanism compensating for the MPI to guarantee the distributed resiliency. We have implemented a prototype system and evaluated it with more than two million transactions on a 500-core HPC cluster. Results show that the prototype of the proposed techniques significantly outperforms vanilla blockchain systems and exhibits strong reliability with MPI. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3458817.3476155 | The International Conference for High Performance Computing, Networking, Storage, and Analysis |
Keywords | DocType | ISSN |
Blockchain,MPI,fault tolerance,resilience,reproducibility,HPC | Conference | 2167-4329 |
ISBN | Citations | PageRank |
978-1-6654-8390-2 | 1 | 0.34 |
References | Authors | |
14 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abdullah Al Mamun | 1 | 350 | 43.65 |
feng yan | 2 | 40 | 7.98 |
Dongfang Zhao | 3 | 1 | 0.34 |