Abstract | ||
---|---|---|
Recent years have seen the increased adoption of Coarse-Grained Reconfigurable Architectures (CGRAs) as flexible, energy-efficient compute accelerators. Obtaining performance using spatial architectures while supporting diverse applications requires a flexible, high-bandwidth interconnect. Because modern CGRAs support vector units with wide datapaths, designing an interconnect that balances dynamism, communication granularity, and programmability is a challenging task.
In this work, we explore the space of spatial architecture interconnect dynamism, granularity, and programmability. We start by characterizing several benchmarks' communication patterns and showing links' imbalanced bandwidth requirements, fanout, and data width. We then describe a compiler stack that maps applications to both static and dynamic networks and performs virtual channel allocation to guarantee deadlock freedom. Finally, using a cycle-accurate simulator and 28 nm ASIC synthesis, we perform a detailed performance, area, and power evaluation across the identified design space for a variety of benchmarks. We show that the best network design depends on both applications and the underlying accelerator architecture. Network performance correlates strongly with bandwidth for streaming accelerators, and scaling raw bandwidth is more area- and energy-efficient with a static network. We show that the application mapping can be optimized to move less data by using a dynamic network as a fallback from a high-bandwidth static network. This static-dynamic hybrid network provides a 1.8x energy-efficiency and 2.8x performance advantage over the purely static and purely dynamic networks, respectively.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3307650.3322249 | Proceedings of the 46th International Symposium on Computer Architecture |
Keywords | Field | DocType |
CGRAs, hardware accelerators, interconnection network, reconfigurable architectures | Dynamic network analysis,Computer architecture,Network planning and design,Computer science,Parallel computing,Deadlock,Bandwidth (signal processing),Granularity,Network performance,Virtual channel,Scalability | Conference |
ISSN | ISBN | Citations |
1063-6897 | 978-1-4503-6669-4 | 0 |
PageRank | References | Authors |
0.34 | 22 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yaqi Zhang | 1 | 44 | 2.12 |
Alexander Rucker | 2 | 12 | 2.51 |
Matthew Vilim | 3 | 0 | 0.34 |
Raghu Prabhakar | 4 | 40 | 1.70 |
William Hwang | 5 | 21 | 2.16 |
Kunle Olukotun | 6 | 4532 | 373.50 |