Title
HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction
Abstract
Coarse-grained reconfigurable array (CGRA) has emerged as a promising hardware accelerator due to the excellent balance between reconfigurability, performance, and energy efficiency. The performance of a CGRA strongly depends on the existence of a high-quality compiler to map the application kernels on the architecture. Unfortunately, the state-of-the-art compiler technology falls short in generating high-performance mapping within an acceptable compilation time, especially with increasing CGRA size. We propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> —a fast and scalable CGRA mapping approach—that is also adept at producing close-to-optimal solutions for regular computational kernels prevalent in existing and emerging application domains. The key strategy behind <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> ’s efficiency and scalability is to exploit the regularity in the computation by employing a virtual systolic array (VSA) as an intermediate abstraction layer in a hierarchical mapping. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> first maps the loop iterations of the kernel onto a VSA and then distills out the unique patterns in the mapping. These unique patterns are subsequently mapped onto subspaces of the physical CGRA. They are arranged together according to the systolic array mapping to create a complete mapping of the kernel. Experimental results confirm that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> can generate application mappings that hit the performance envelope of the CGRA. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> offers <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$17.3\times $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5\times $ </tex-math></inline-formula> improvement in performance and energy efficiency of the mappings compared to the state of the art. The compilation time of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HiMap</i> for near-optimal mappings is less than 15 min for 64 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> 64 CGRA while existing approaches take days to generate inferior mappings.
Year
DOI
Venue
2022
10.1109/TCAD.2021.3132551
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Keywords
DocType
Volume
Coarse-grained reconfigurable array (CGRA),systolic arrays
Journal
41
Issue
ISSN
Citations 
10
0278-0070
0
PageRank 
References 
Authors
0.34
16
5
Name
Order
Citations
PageRank
Dhananjaya Wijerathne122.05
Zhaoying Li200.34
Anuj Pathania318114.97
Tulika Mitra42714135.99
Lothar Thiele501.01