Title | ||
---|---|---|
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow. |
Abstract | ||
---|---|---|
High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture (CUDA), enables efficient description and implementation of independent computation cores. HLS tools can effectively translate the many threads of computation present in the parallel descriptions into independent, optimized cores. The generated hardware cores often heavily share input data ... |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/TVLSI.2015.2497259 | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
Keywords | Field | DocType |
Graphics processing units,Bandwidth,Kernel,Ports (Computers),Field programmable gate arrays,Hardware,Parallel processing | Computer architecture,Memory bandwidth,CUDA,Computer science,Field-programmable gate array,Network on a chip,Thread (computing),Register-transfer level,Auxiliary memory,Embedded system,Scalability | Journal |
Volume | Issue | ISSN |
24 | 6 | 1063-8210 |
Citations | PageRank | References |
8 | 0.57 | 23 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yao Chen | 1 | 56 | 11.01 |
Swathi T. Gurumani | 2 | 85 | 9.66 |
Yun Liang | 3 | 868 | 59.55 |
Guofeng Li | 4 | 9 | 2.28 |
Donghui Guo | 5 | 107 | 21.93 |
Kyle Rupnow | 6 | 250 | 21.49 |
Deming Chen | 7 | 1432 | 127.66 |