Title
Performance Evaluation of a 3D-Stencil Library for Distributed Memory Array Accelerators
Abstract
EMAX: Energy-aware Multimode Accelerator Extension is equipped with distributed single-port local memories and ring-formed interconnections. The accelerator is designed to achieve extremely high throughput for scientific computations, big data and image processing and also to achieve low power consumption. However, before mapping algorithms on the accelerator, application developers should have sufficient knowledge of the hardware organization and specially designed instructions. They will, furthermore, need to make significant efforts to tune the code for improving execution efficiency, in the case that no well-designed compiler or library is available. To address this problem, we focus especially on library support for the stencil (nearest-neighbor) computations, which represent a class of algorithms popularly used in many partial differential equation (PDE) solvers. In this research, we take up the following topics: (1) System configuration, features and mnemonics of EMAX, (2) Instruction mapping techniques that can reduce the amount of data to be read from the main memory, (3) Performance evaluation of the library for PDE solvers. With the features of the library that can reuse the local data across the outer loop iterations and can map many instructions by unrolling outer loops, the amount of data to be read from main memory is significantly reduced to a minimum of 1/7 compared with a hand-tuned code. In addition, the stencil library was found capable of reducing 23% of the execution time compared with a general purpose processor.
Year
DOI
Venue
2015
10.1109/CANDAR.2014.100
IEICE Transactions
Keywords
Field
DocType
distributed processing,partial differential equations,storage management,3d-stencil library,big data,emax features,emax mnemonics,emax system configuration,pde solver,distributed memory array accelerators,distributed single-port local memory,energy-aware multimode accelerator extension,general purpose processor,hardware organization,image processing,instruction mapping techniques,library support,partial differential equation,performance evaluation,ring-formed interconnection,scientific computation,stencil computation,cgra,accelerator,coarse grained reconfigurable architecture,library,optimization,stencil
Computer architecture,Computer science,Stencil,Distributed memory,Computer hardware
Journal
Volume
Issue
Citations 
98-D
12
2
PageRank 
References 
Authors
0.40
1
4
Name
Order
Citations
PageRank
Yoshikazu Inagaki120.40
Shinya Takamaeda-Yamazaki26516.83
Jun Yao339547.98
Yasuhiko Nakashima412832.60