Abstract | ||
---|---|---|
Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a GPU under hybrid programming. Using a set of microbenchmarks and applications on a GPU cluster, we show that thread- and process-based context hosting have different tradeoffs. Experimental results on application benchmarks suggest that both thread-based context funneling and process-based context switching natively perform similarly on the latest Fermi GPU, while manually guided context funneling is currently the best way to achieve optimal performance. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1145/2381056.2381081 | ACM SIGMETRICS Performance Evaluation Review |
Keywords | Field | DocType |
close study,latest fermi gpus,latest fermi gpu,efficient coordination mechanism,towards efficient gpu sharing,application benchmarks,multicore processor,gpu cluster,thread-based context,different tradeoffs,end user,process-based context,upc,multicore processors,multicore | Computer architecture,GPU cluster,End user,Computer science,CUDA,Parallel computing,Thread (computing),Multi-core processor,Hybrid programming,Context switch,Distributed computing,Scalability | Journal |
Volume | Issue | Citations |
40 | 2 | 7 |
PageRank | References | Authors |
0.61 | 9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lingyuan Wang | 1 | 38 | 3.29 |
Miaoqing Huang | 2 | 292 | 27.50 |
Tarek El-Ghazawi | 3 | 427 | 44.88 |