Title
Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC
Abstract
The multi-zone scalar pentadiagonal (SP-MZ) benchmark, part of the multi-zone NAS Parallel Benchmark suite, is ported to graphics processing units (GPUs) using OpenACC compiler directives. The sequence of optimizations necessary to transform the SP-MZ algorithm from CPU-oriented to GPU-oriented is presented. The performance of the OpenACC implementation on GPUs is measured using predefined mesh sizes. We observe a 30% speed-up using the OpenACC implement on an NVIDIA Kepler K40 GPU compared to an eight-core Intel Xeon E5-2670 CPU with the small Class-A mesh (256 thousand points). Setting inter-zone boundary conditions directly on the device reduced run-time by 22% due to the high cost of host-device communication. Multi-device benchmarks with the larger Class-C mesh (4.3 million points) were scaled to 32 GPU nodes and matched or outperformed the CPU baseline with ten cores per node. Combining both CPU and GPU computing power improved the throughput on the Class-C mesh by 75%. We define a larger zone size with one million points per node to better reflect modern usage with codes similar to SP-MZ. The OpenACC GPU implementation outperformed the baseline multi-core CPU by 29% on this real-world mesh size.
Year
DOI
Venue
2015
10.1145/2832105.2832110
WACCPD@SC
Field
DocType
Citations 
Graphics,Central processing unit,CUDA,Computer science,Parallel computing,Algorithm,Compiler,Porting,General-purpose computing on graphics processing units,Throughput,Xeon
Conference
3
PageRank 
References 
Authors
0.49
0
2
Name
Order
Citations
PageRank
Christopher P. Stone150.93
Bracy H. Elton2114.27