Title
FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow.
Abstract
High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture (CUDA), enables efficient description and implementation of independent computation cores. HLS tools can effectively translate the many threads of computation present in the parallel descriptions into independent, optimized cores. The generated hardware cores often heavily share input data ...
Year
DOI
Venue
2016
10.1109/TVLSI.2015.2497259
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Keywords
Field
DocType
Graphics processing units,Bandwidth,Kernel,Ports (Computers),Field programmable gate arrays,Hardware,Parallel processing
Computer architecture,Memory bandwidth,CUDA,Computer science,Field-programmable gate array,Network on a chip,Thread (computing),Register-transfer level,Auxiliary memory,Embedded system,Scalability
Journal
Volume
Issue
ISSN
24
6
1063-8210
Citations 
PageRank 
References 
8
0.57
23
Authors
7
Name
Order
Citations
PageRank
Yao Chen15611.01
Swathi T. Gurumani2859.66
Yun Liang386859.55
Guofeng Li492.28
Donghui Guo510721.93
Kyle Rupnow625021.49
Deming Chen71432127.66