Efficient compilation of CUDA kernels for high-performance computing on FPGAs - Citegraph

Paper Info

Title
Efficient compilation of CUDA kernels for high-performance computing on FPGAs

Abstract
The rise of multicore architectures across all computing domains has opened the door to heterogeneous multiprocessors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs, in particular, are becoming very popular for speeding up compute-intensive kernels of scientific, imaging, and simulation applications. New programming models that facilitate parallel processing on heterogeneous systems containing GPUs are spreading rapidly in the computing community. By leveraging these investments, the developers of other accelerators have an opportunity to significantly reduce the programming effort by supporting those accelerator models already gaining popularity. In this work, we adapt one such language, the CUDA programming model, into a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool (available from Xilinx) which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SIMT (Single Instruction, Multiple Thread) CUDA code into task-level parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multicore accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.

Year	DOI	Venue
2013	10.1145/2514641.2514652	ACM Trans. Embedded Comput. Syst.
Keywords	Field	DocType
new programming model,cuda code,high-abstraction fpga programming,cuda kernel,high-performance computing,efficient compilation,new fpga design flow,fpga multicore accelerator,cuda programming model,computing domain,computing community,programming effort,cuda-to-fpga flow,high level synthesis,high performance computing,parallel programming model,fpga	Computer architecture,Supercomputer,Programming paradigm,Computer science,CUDA,High-level synthesis,Parallel computing,Field-programmable gate array,Real-time computing,Parallel programming model,Performance per watt,Multi-core processor	Journal
Volume	Issue	ISSN
13	2	1539-9087
Citations	PageRank	References
16	0.72	14
Authors
6

Authors (6 rows)

Cited by (16 rows)

References (14 rows)

Name	Order	Citations	PageRank
Alexandros Papakonstantinou	1	83	6.41
Karthik Gururaj	2	177	12.19
john a stratton	3	489	36.44
Deming Chen	4	1432	127.66
Jason Cong	5	1027	87.55
Wen-mei W. Hwu	6	4322	511.62

1