Title
Predictive and Distributed Routing Balancing, an Application-Aware Approach.
Abstract
The interconnection design in computing clusters and data centers is expected to change significantly in the near future to sustain the increasing communication demand at controlled capitalization and operational cost. In particular, a shift from typ- ical and expensive full-bisection bandwidth interconnects (which safely cover the worst communication cases) to application oriented designs (which may provide cost-efficient data movement at larger system scales) is devised in academic research and industry initiatives. Having information of communication dynamics of applications (e.g. repetitiveness, computing and communication phases, traffic pattern and bandwidth, etc.) allows for efficiently managing and provisioning of network re- sources at reduced cost. This paper presents an Application-Aware Predictive and Distributed Routing Balancing technique (PR-DRB), a new method that controls network inefficiencies based on communication patterns of applications and speculative routing, PR-DRB monitors increments in the communication latency and, then, dynamically re-distributes the network traffic over multiple paths (path expansion) to deal with load unbalances. Additionally, PR-DRB stores the number of paths used to balance the traffic (solution) and links it to the application's pattern that caused the unbalance (problem). This information allows PR-DRB to respond to similar situations in repetitive patterns, quickly converging to a stable solution. Evaluation results show latency and completion time reductions of up to 37% for experiments conducted on 64 nodes executing the NAS benchmarks and the Lammps application.
Year
DOI
Venue
2013
10.1016/j.procs.2013.05.181
Procedia Computer Science
Keywords
Field
DocType
Interconnection networks,High performance computing,Predictive routing,Hpc clusters,Parallel scientific applications,Application-aware routing
Data mining,Reduced cost,Airfield traffic pattern,Supercomputer,Static routing,Computer science,Latency (engineering),Computer network,Provisioning,Bandwidth (signal processing),Interconnection,Distributed computing
Conference
Volume
ISSN
Citations 
18
1877-0509
1
PageRank 
References 
Authors
0.40
10
5
Name
Order
Citations
PageRank
Carlos Nunez Castillo110.40
Diego Lugones2359.77
daniel franco3246.18
Emilio Luque41097176.18
Martin Collier530926.55