Title
Fabsim-X: A Simulation Framework for the Analysis of Large-Scale Topologies and Congestion Control Protocols in Data Center Networks
Abstract
The explosive growth in cloud-computing and overall data center system growth has created an unprecedented demand on system architects and designers to continuously develop more complex system networks to effectively satisfy the insatiable appetite to process, move, and store large amounts of data. Nonlinear system behavior caused by emerging workloads and use-cases, varying end-to-end congestion protocols, and heterogeneity in the various compute and storage capabilities of custom designed accelerators further compounds the design problem. Modern simulation methodologies lack a cohesive and efficient framework to address the interoperability of the intersecting layers at scale. In this paper, we present a simulation framework for evaluating congestion control protocols. Furthermore, we present a set of optimizations that enable analysis for longer simulated times and at network scales up to 128K nodes, which is vital for proper analysis of workloads that require long run times (e.g., AI training) or workloads that are known to have scaling issues (e.g., RDMA). Specifically, we evaluate congestion control performance at various scales, study the implications of topology scaling on congestion, and the performance impact of simultaneous heterogeneous protocols.
Year
DOI
Venue
2020
10.1109/MASCOTS50786.2020.9285933
2020 28th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
Keywords
DocType
ISSN
Performance Simulation,Networking,Congestion Control,Fat-Trees,TCP,iWARP,RoCEv2
Conference
1526-7539
ISBN
Citations 
PageRank 
978-1-7281-9239-0
0
0.34
References 
Authors
8
14