Abstract | ||
---|---|---|
ABSTRACT This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one application, we start with common sparse data formats, each of which supports multiple applications. Using a declarative programming model, Capstan supports application-independent sparse iteration and memory primitives that can be mapped to vectorized, high-performance hardware. We optimize random-access sparse memories with configurable out-of-order execution to increase SRAM random-access throughput from 32% to 80%. For a variety of sparse applications, Capstan with DDR4 memory is 18× faster than a multi-core CPU baseline, while Capstan with HBM2 memory is 16× faster than an Nvidia V100 GPU. For sparse applications that can be mapped to Plasticine, a recent dense RDA, Capstan is 7.6× to 365× faster and only 16% larger. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3466752.3480047 | MICRO |
DocType | Citations | PageRank |
Conference | 3 | 0.36 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alexander Rucker | 1 | 12 | 2.51 |
Matthew Vilim | 2 | 3 | 0.36 |
Tian Zhao | 3 | 113 | 13.56 |
Yaqi Zhang | 4 | 3 | 0.36 |
Raghu Prabhakar | 5 | 3 | 0.36 |
Kunle Olukotun | 6 | 4532 | 373.50 |