Title
CASCADE: High Throughput Data Streaming via Decoupled Access-Execute CGRA
Abstract
A Coarse-Grained Reconfigurable Array (CGRA) is a promising high-performance low-power accelerator for compute-intensive loop kernels. While the mapping of the computations on the CGRA is a well-studied problem, bringing the data into the array at a high throughput remains a challenge. A conventional CGRA design involves on-array computations to generate memory addresses for data access undermining the attainable throughput. A decoupled access-execute architecture, on the other hand, isolates the memory access from the actual computations resulting in a significantly higher throughput. We propose a novel decoupled access-execute CGRA design called CASCADE with full architecture and compiler support for high-throughput data streaming from an on-chip multi-bank memory. CASCADE offloads the address computations for the multi-bank data memory access to a custom designed programmable hardware. An end-to-end fully-automated compiler synchronizes the conflict-free movement of data between the memory banks and the CGRA. Experimental evaluations show on average 3× performance benefit and 2.2× performance per watt improvement for CASCADE compared to an iso-area conventional CGRA with a bigger processing array in lieu of a dedicated hardware memory address generation logic.
Year
DOI
Venue
2019
10.1145/3358177
ACM Transactions on Embedded Computing Systems (TECS)
Keywords
Field
DocType
Coarse grained reconfigurable arrays, decoupled access-execute architectures, multi-bank memory partitioning
Computer science,Parallel computing,Cascade,Throughput
Journal
Volume
Issue
ISSN
18
5s
1539-9087
Citations 
PageRank 
References 
1
0.35
0
Authors
5
Name
Order
Citations
PageRank
Dhananjaya Wijerathne122.05
Zhaoying Li221.03
Manupa Karunarathne320.69
Anuj Pathania418114.97
Tulika Mitra52714135.99