Title
Implementing And Evaluating Opencl On An Armv8 Multi-Core Cpu
Abstract
The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. But guaranteeing portability relies heavily on platform-specific implementations. In this paper, we provide an OpenCL implementation on an ARMv8 multi-core CPU, which efficiently maps the generic OpenCL platform model to the ARMv8 multi-core architecture. With this implementation, we first characterize the maximum achieved arithmetic throughput and memory accessing bandwidth on the architecture, and measure the OpenCL-related overheads. Our results demonstrate that there exists an optimization room for improving OpenCL kernel performance. Then, we compare the performance of OpenCL against serial codes and OpenMP codes with 11 benchmarks. The experimental results show that (1) the OpenCL implementation can achieve an average speedup of 6x compared to its OpenMP counterpart, and (2) the GPU-specified OpenCL codes are often unsuitable for this ARMv8 multi-core CPU.
Year
DOI
Venue
2017
10.1109/ISPA/IUCC.2017.00131
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017)
Keywords
Field
DocType
OpenCL, FT-1500A, performance, programming
Kernel (linear algebra),Computer science,Parallel computing,Implementation,Bandwidth (signal processing),Human–computer interaction,Software portability,Throughput,Multi-core processor,Speedup
Conference
ISSN
Citations 
PageRank 
2158-9178
1
0.35
References 
Authors
0
5
Name
Order
Citations
PageRank
Jianbin Fang126525.31
Peng Zhang2485.09
Tao Tang3427.44
Chun Huang4138.00
Canqun Yang518829.39