Title
Deferring accelerator offloading decisions to application runtime
Abstract
Reconfigurable architectures provide an opportunity to accelerate a wide range of applications, frequently by exploiting data-parallelism, where the same operations are homogeneously executed on a (large) set of data. However, when the sequential code is executed on a host CPU and only data-parallel loops are executed on an FPGA coprocessor, a sufficiently large number of loop iterations (trip counts) is required, such that the control- and data-transfer overheads to the coprocessor can be amortized. However, the trip count of large data-parallel loops is frequently not known at compile time, but only at runtime just before entering a loop. Therefore, we propose to generate code both for the CPU and the coprocessor, and to defer the decision where to execute the appropriate code to the runtime of the application when the trip count of the loop can be determined just at runtime. We demonstrate how an LLVM compiler based toolflow can automatically insert appropriate decision blocks into the application code. Analyzing popular benchmark suites, we show that this kind of runtime decisions is often applicable. The practical feasibility of our approach is demonstrated by a toolflow that automatically identifies loops suitable for vectorization and generates code for the FPGA coprocessor of a Convey HC-1. The toolflow adds decisions based on a comparison of the runtime-computed trip counts to thresholds for specific loops and also includes support to move just the required data to the coprocessor. We evaluate the integrated toolflow with characteristic loops executed on different input data sizes.
Year
DOI
Venue
2014
10.1109/ReConFig.2014.7032509
ReConFigurable Computing and FPGAs
Keywords
Field
DocType
coprocessors,field programmable gate arrays,program compilers,program control structures,reconfigurable architectures,CPU,Convey HC-1,FPGA coprocessor,LLVM compiler based toolflow,accelerator offloading decision,application runtime,compile time,control-transfer overhead,data-parallelism,data-transfer overhead,decision block,large data-parallel loop,loop iteration,reconfigurable architecture,runtime-computed trip count,sequential code
Central processing unit,Computer science,Compile time,Parallel computing,Vectorization (mathematics),Field-programmable gate array,Real-time computing,Compiler,Memory management,Coprocessor,Benchmark (computing),Embedded system
Conference
ISSN
Citations 
PageRank 
2325-6532
2
0.42
References 
Authors
19
4
Name
Order
Citations
PageRank
Gavin Vaz1122.85
Heinrich Riebler2133.58
Tobias Kenter3136.07
Christian Plessl429735.98