Title
NetEC: Accelerating Erasure Coding Reconstruction With In-Network Aggregation
Abstract
In distributed storage systems, Erasure Coding (EC) is a crucial technology to enable high data availability. By downloading parity data from survived machines, EC can reconstruct lost data with much lower storage overheads than data replication. However, this reduction in storage cost comes at the expense of extra performance problems: <i>low reconstruction rate</i> , <i>high degraded read latency</i> , and <i>high host CPU utilization</i> . Our analysis shows that these performance problems are deeply rooted in the <i>host-based</i> EC processing. To resolve these problems, we present NetEC, an in-network accelerating framework that fully offloads EC to the new generation programmable switching ASICs. We propose Explicit Buffer Size Notification (EBSN) to constrain decoding buffer usage, and design an on-switch one-to-many TCP proxy to integrate EBSN with TCP. We also design two parallel Galois Field (GF) offloading methods—table lookup and bitmatrix methods—to maximize parsable bytes. We implement NetEC on programmable switches and integrate it with HDFS. Extensive evaluations show that NetEC improves the reconstruction rate by 2.7x-6.8x, reduces the degraded read latency significantly, and removes the host CPU overhead completely. We also emulate multi-rack scenarios and show that NetEC is able to support <inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula> GB/s reconstruction rate and tens of concurrent tasks.
Year
DOI
Venue
2022
10.1109/TPDS.2022.3145836
IEEE Transactions on Parallel and Distributed Systems
Keywords
DocType
Volume
Erasure coding,distributed storage sytems,programmable switch,software-defined networks
Journal
33
Issue
ISSN
Citations 
10
1045-9219
1
PageRank 
References 
Authors
0.36
14
8
Name
Order
Citations
PageRank
Yi Qiao120.72
Menghao Zhang210.36
Yu Zhou392.95
Xiao Kong431.13
Han Zhang510.70
Jun Bi6909107.27
Jun Bi7909107.27
Jilong Wang85719.88