FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs - Citegraph

Paper Info

Title
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

Abstract
As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine and coarse grained parallelism by using special APIs. CUDA is such a parallel computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs autopilot, an advanced high level synthesis tool which enables high abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for autopilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.

Year	DOI	Venue
2009	10.1109/SASP.2009.5226333	San Francisco, CA
Keywords	Field	DocType
application program interfaces,field programmable gate arrays,multiprocessing systems,parallel architectures,CUDA kernel,FPGA programming,Moores law,application program interface,clock frequency,compute unified device architecture,computing industry,field programmable gate array,graphics processing unit,multicore system,multiprocessor system,parallel processing,performance per watt boosting,power dissipation	SPMD,CUDA,Computer science,Parallel computing,Real-time computing,Thread (computing),High-level programming language,Application programming interface,Performance per watt,Concurrent computing,Graphics processing unit	Conference
ISBN	Citations	PageRank
978-1-4244-4938-5	34	2.02
References	Authors
2	6

Authors (6 rows)

Cited by (34 rows)

References (2 rows)

Name	Order	Citations	PageRank
Alexandros Papakonstantinou	1	83	6.41
Karthik Gururaj	2	177	12.19
john a stratton	3	489	36.44
Deming Chen	4	1432	127.66
Jason Cong	5	7069	515.06
Wen-mei W. Hwu	6	4322	511.62

1