Abstract | ||
---|---|---|
Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050 GPU. We improve the throughput up to 56.17X and show that the PRF-augmented system outperforms the GPU for \(9\times 9\) or larger mask sizes, even in bandwidth-constrained systems. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/s10766-017-0494-1 | International Journal of Parallel Programming |
Keywords | Field | DocType |
Dataflow computing, Parallel memory accesses, Polymorphic register file, Bandwidth, Vector lanes, Convolution, High performance computing, High-level synthesis | Supercomputer,Computer science,Parallel computing,High-level synthesis,Compiler,Exploit,Dataflow,Bandwidth (signal processing),Throughput,CAS latency | Journal |
Volume | Issue | ISSN |
46 | 6 | 0885-7458 |
Citations | PageRank | References |
1 | 0.36 | 18 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Catalin Bogdan Ciobanu | 1 | 30 | 7.26 |
Georgi Gaydadjiev | 2 | 1117 | 104.92 |
Christian Pilato | 3 | 329 | 32.19 |
D. Sciuto | 4 | 1720 | 176.61 |