Title
Base64 Encoding on Heterogeneous Computing Platforms
Abstract
Base64 encoding has many applications on the Web. Previous studies investigated the optimizations of Base64 encoding algorithm on central processing units (CPUs). In this paper, we describe the optimizations of the algorithm on heterogeneous computing platforms. More specifically, we explain the algorithm, convert the algorithm to kernels written in CUDA C/C++ and Open Computing Language (OpenCL), optimize the CUDA and OpenCL applications with CUDA and OpenCL streams which can overlap data transfers with kernel computations, and vectorize the CUDA and OpenCL kernels to improve kernel throughput. We evaluate the impact of the number of streams upon the kernel performance on an NVIDIA Pascal P100 graphics processing unit (GPU) and a Nallatech 385A card that features an Intel Arria 10 GX1150 field-programmable gate array (FPGA). We also measure the performance and power of the applications on the CPU, GPU, and FPGA to know the advantage of each platform and the benefit of kernel offloading. The experiments show that using vector data types in the kernels is not for performance, and more work-items is better than large vectors per work-item on the GPU. OpenCL and CUDA streams can achieve almost the same performance on the GPU, but streams should be used with caution when GPU resources are underutilized. On the FPGA, kernel vectorization using 16 vector lanes can achieve the highest performance when the number of streams is one. However, increasing the vector width per work-item and the number of streams can decrease the kernel computation time for each stream, and thereby reduce the number of concurrent operations across the streams. While the raw performance on the GPU is 3.1X higher than that on the FPGA, the FPGA consumes 3.4X less power. A comparison with a state-of-the-art implementation on an Intel CPU server shows an increasing benefit of kernel offloading.
Year
DOI
Venue
2019
10.1109/ASAP.2019.00014
2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Keywords
Field
DocType
Heterogeneous computing, GPU, FPGA, Base64 encoding, CUDA, OpenCL, Stream
Kernel (linear algebra),Central processing unit,CUDA,Computer science,Parallel computing,Field-programmable gate array,Vectorization (mathematics),Symmetric multiprocessor system,Image tracing,Graphics processing unit
Conference
Volume
ISSN
ISBN
2160-052X
2160-0511
978-1-7281-1602-0
Citations 
PageRank 
References 
0
0.34
12
Authors
2
Name
Order
Citations
PageRank
Zheming Jin11711.95
Hal Finkel263.21