Title | ||
---|---|---|
ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library |
Abstract | ||
---|---|---|
Distributed systems have been widely adopted for deep neural networks model training. However, the scalability of distributed training systems is largely bounded by the communication cost. We design a highly efficient collective communication library, namely Alibaba Collective Communication Library (ACCL), to build distributed training systems with linear scalability. ACCL provides optimized algor... |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/MM.2021.3091475 | IEEE Micro |
Keywords | DocType | Volume |
Servers,Training,Bandwidth,Routing,Fabrics,Payloads,Parallel algorithms | Journal | 41 |
Issue | ISSN | Citations |
5 | 0272-1732 | 0 |
PageRank | References | Authors |
0.34 | 0 | 20 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianbo Dong | 1 | 4 | 1.46 |
Shaochuang Wang | 2 | 0 | 0.34 |
Fei Feng | 3 | 26 | 1.85 |
Zheng Cao | 4 | 6 | 2.86 |
Heng Pan | 5 | 0 | 0.34 |
Lingbo Tang | 6 | 26 | 1.85 |
Pengcheng Li | 7 | 0 | 0.34 |
Hao Li | 8 | 25 | 11.35 |
Qianyuan Ran | 9 | 0 | 0.34 |
Yiqun Guo | 10 | 0 | 0.34 |
Shanyuan Gao | 11 | 0 | 0.34 |
Xin Long | 12 | 0 | 0.34 |
Jie Zhang | 13 | 47 | 15.01 |
Yong Li | 14 | 0 | 0.34 |
Zhisheng Xia | 15 | 0 | 0.34 |
Liuyihan Song | 16 | 4 | 2.15 |
Yingya Zhang | 17 | 21 | 3.81 |
Pan Pan | 18 | 3 | 4.16 |
Guohui Wang | 19 | 1088 | 60.78 |
Xiaowei Jiang | 20 | 5 | 1.76 |