Abstract | ||
---|---|---|
Reconfigurable architectures provide an opportunity to accelerate a wide range of applications, frequently by exploiting data-parallelism, where the same operations are homogeneously executed on a (large) set of data. However, when the sequential code is executed on a host CPU and only data-parallel loops are executed on an FPGA coprocessor, a sufficiently large number of loop iterations (trip counts) is required, such that the control- and data-transfer overheads to the coprocessor can be amortized. However, the trip count of large data-parallel loops is frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code both for the CPU and the coprocessor, and to defer the decision where to execute the appropriate code to the runtime of the application when the trip count of the loop can be determined just at runtime. We demonstrate how an LLVM compiler based toolflow can automatically insert appropriate decision blocks into the application code. Analyzing popular benchmark suites, we show that this kind of runtime decisions is often applicable. The practical feasibility of our approach is demonstrated by a toolflow that automatically identifies loops suitable for vectorization and generates code for the FPGA coprocessor of a Convey HC-1. The toolflow adds decisions based on a comparison of the runtime-computed trip counts to thresholds for specific loops and also includes support to move just the required data to the coprocessor. We evaluate the integrated toolflow with characteristic loops executed on different input data sizes. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/ReConFig.2014.7032509 | ReConFigurable Computing and FPGAs |
Keywords | Field | DocType |
coprocessors,field programmable gate arrays,program compilers,program control structures,reconfigurable architectures,CPU,Convey HC-1,FPGA coprocessor,LLVM compiler based toolflow,accelerator offloading decision,application runtime,compile time,control-transfer overhead,data-parallelism,data-transfer overhead,decision block,large data-parallel loop,loop iteration,reconfigurable architecture,runtime-computed trip count,sequential code | Central processing unit,Computer science,Compile time,Parallel computing,Vectorization (mathematics),Field-programmable gate array,Real-time computing,Compiler,Memory management,Coprocessor,Benchmark (computing),Embedded system | Conference |
ISSN | Citations | PageRank |
2325-6532 | 2 | 0.42 |
References | Authors | |
19 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gavin Vaz | 1 | 12 | 2.85 |
Heinrich Riebler | 2 | 13 | 3.58 |
Tobias Kenter | 3 | 13 | 6.07 |
Christian Plessl | 4 | 297 | 35.98 |