Title
Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers
Abstract
The dragonfly topology is a popular choice for building high-radix, low-diameter, hierarchical networks with high-bandwidth links. On Cray installations of the dragonfly network, job placement policies and routing inefficiencies can lead to significant network congestion for a single job and multi-job workloads. In this paper, we explore the effects of job placement, parallel workloads and network configurations on network health to develop a better understanding of inter-job interference. We have developed a functional network simulator, Damselfly, to model the network behavior of Cray Cascade, and a visual analytics tool, DragonView, to analyze the simulation output. We simulate several parallel workloads based on five representative communication patterns on up to 131,072 cores. Our simulations and visualizations provide unique insight into the buildup of network congestion and present a trade-off between deployment dollar costs and performance of the network.
Year
DOI
Venue
2016
10.1109/IPDPS.2016.123
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Keywords
Field
DocType
dragonfly network,congestion,inter-job interference,simulation,visual analytics
Software deployment,Computer science,Parallel computing,Computer network,Visual analytics,Network simulation,Network topology,Bandwidth (signal processing),Network congestion,Network behavior,Network traffic control,Distributed computing
Conference
ISSN
ISBN
Citations 
1530-2075
978-1-5090-2141-3
10
PageRank 
References 
Authors
0.53
9
5
Name
Order
Citations
PageRank
Abhinav Bhatele162543.42
Nikhil Jain232124.01
Yarden Livnat360750.10
Valerio Pascucci43241192.33
Peer-Timo Bremer5144682.47