Abstract | ||
---|---|---|
Sunway TaihuLight is China's recent top-ranked supercomputer worldwide that was the first to be built entirely with home-grown processors. This supercomputer can be programmed with two approaches: directive-based OpenACC and native programming. These approaches are studied here using GTC-P, a particle-in-cell code for investigating micro-turbulence in magnetic fusion plasmas. We have compared the performance and programming efforts between the OpenACC and the native version of GTC-P. Associated results show that in the OpenACC version, the kernel with irregular memory access becomes the main performance bottleneck due to poor data locality. To address this issue, we have applied two optimizations on the native version: (1) register level communication (RLC); and (2) an "asynchronization" strategy. With these two optimizations, the native version can achieve up to 2.5X speedup for the memory-bound kernel compared with the OpenACC version. In addition, we have now scaled GTC-P on 4,259,840 cores of TaihuLight and demonstrate performance comparisons with several world-leading supercomputers. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/CLUSTER.2018.00021 | 2018 IEEE International Conference on Cluster Computing (CLUSTER) |
Keywords | Field | DocType |
Sunway TaihuLight,GTC P,optimization,OpenACC | Kernel (linear algebra),Bottleneck,Locality,Supercomputer,Computer science,Parallel computing,Bandwidth (signal processing),Sunway TaihuLight,Speedup | Conference |
ISSN | ISBN | Citations |
1552-5244 | 978-1-5386-8320-0 | 1 |
PageRank | References | Authors |
0.38 | 13 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Linjin Cai | 1 | 2 | 0.74 |
Yichao Wang | 2 | 2 | 1.10 |
William Tang | 3 | 17 | 2.31 |
Bei Wang | 4 | 528 | 61.48 |
Stephane Ethier | 5 | 291 | 31.10 |
Zhao Liu | 6 | 25 | 10.73 |
James Lin | 7 | 12 | 4.37 |