Abstract | ||
---|---|---|
We present novel optimizations of the fusion plasmas simulation code, GTC on Tianhe-2 supercomputer. The simulation exhibits excellent weak scalability up to 3072 31S1P Xeon Phi co-processors. An unprecedented up to 5.8x performance improvement is achieved for the GTC on Tianhe-2. An efficient particle exchanging algorithm is developed that simplifies the original iterative scheme to a direct implementation, which leads to a 7.9x performance improvement in terms of MPI communications on 1024 nodes of Tianhe-2. A customized particle sorting algorithm is presented that delivers a 2.0x performance improvement on the co-processor for the kernel relating to the particle computing. A smart offload algorithm that minimizes the data exchange between host and co-processor is introduced. Other optimizations like the loop fusion and vectorization are also presented. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/ScalA.2016.8 | ScalA@SC |
Keywords | DocType | ISBN |
Fusion plasmas simulation, GTC, scalability, Tianhe-2, Xeon Phi | Conference | 978-1-5090-5223-3 |
Citations | PageRank | References |
0 | 0.34 | 10 |
Authors | ||
9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Endong Wang | 1 | 7 | 5.62 |
Shaohua Wu | 2 | 0 | 0.34 |
Qing Zhang | 3 | 0 | 0.34 |
Y. J. Liu | 4 | 11 | 16.87 |
Wenlu Zhang | 5 | 220 | 11.44 |
Zhihong Lin | 6 | 5 | 1.17 |
Yutong Lu | 7 | 307 | 53.61 |
Yunfei Du | 8 | 72 | 14.62 |
Xiaoqian Zhu | 9 | 86 | 16.92 |