Abstract | ||
---|---|---|
Collective operations, such as all reduce, are widely treated as the critical limiting factors in achieving high performance in massively parallel applications. Conventional host-based implementations, which introduce a large amount of point-to-point communications, are less efficient in large-scale systems. To address this issue, we propose a design of switch chip to accelerate collective operations, especially the allreduce operation. The major advantage of the proposed solution is the high scalability since expensive point-to-point communications are avoided. Two kinds of allreduce operations, namely block-allreduce and burst-allreduce, are implemented for short and long messages, respectively. We evaluated the proposed design with both a cycle-accurate simulator and a FPGA prototype system. The experimental results prove that switch-based allreduce implementation is quite efficient and scalable, especially in large-scale systems. In the prototype, our switch-based implementation significantly outperforms the host-based one, with a 16 times improvement in MPI time on 16 nodes. Furthermore, the simulation shows that, upon scaling from 2 to 4096 nodes, the switch-based allreduce latency only increases slightly by less than 2 us. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/ICCCN.2013.6614098 | ICCCN |
Keywords | Field | DocType |
radio links,block-allreduce,collective operations,switch-based allreduce latency,burst-allreduce,message passing interface,massively parallel applications,cycle-accurate simulator,allreduce operation,large-scale systems,switch-based solution,point-to-point communications,host-based implementations,message passing,fpga prototype system,field programmable gate arrays,mpi time,critical limiting factors | Massively parallel,Computer science,Latency (engineering),Parallel computing,Computer network,FPGA prototype,Field-programmable gate array,Implementation,Chip,Message passing,Scalability,Distributed computing | Conference |
Volume | Issue | ISBN |
null | null | 978-1-4673-5774-6 |
Citations | PageRank | References |
0 | 0.34 | 18 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nongda Hu | 1 | 8 | 1.56 |
Dawei Wang | 2 | 3 | 1.09 |
Zheng Cao | 3 | 0 | 3.04 |
Xuejun An | 4 | 6 | 6.26 |
SUN Ning-Hui | 5 | 1268 | 97.37 |