Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library. - Citegraph

Paper Info

Title
Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library.

Abstract
We present an analysis on optimizing performance of a single C++11 source code using the Alpaka hardware abstraction library. For this we use the general matrix multiplication (GEMM) algorithm in order to show that compilers can optimize Alpaka code effectively when tuning key parameters of the algorithm. We do not intend to rival existing, highly optimized DGEMM versions, but merely choose this example to prove that Alpaka allows for platform-specific tuning with a single source code. In addition we analyze the optimization potential available with vendor-specific compilers when confronted with the heavily templated abstractions of Alpaka. We specifically test the code for bleeding edge architectures such as Nvidia’s Tesla P100, Intel’s Knights Landing (KNL) and Haswell architecture as well as IBM’s Power8 system. On some of these we are able to reach almost 50% of the peak floating point operation performance using the aforementioned means. When adding compiler-specific Open image in new window we are able to reach 5 Open image in new window on a P100 and over 1 Open image in new window on a KNL system.

Year	DOI	Venue
2017	10.1007/978-3-319-67630-2_36	ISC Workshops
DocType	Volume	ISSN
Conference	abs/1706.10086	J.M. Kunkel et al. (Eds.): ISC High Performance Workshops 2017, LNCS 10524, pp. 496-514, 2017
Citations	PageRank	References
0	0.34	4
Authors
6

Authors (6 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Alexander Matthes	1	1	1.03
R. Widera	2	22	2.38
Erik Zenker	3	0	0.34
Benjamin Worpitz	4	0	0.34
A. Huebl	5	22	2.38
M. Bussmann	6	23	3.80

1