Abstract | ||
---|---|---|
This paper studies the differences in parallel programming between CPUs and GPUs (Graphics Processing Units) and proposes programming techniques for boosting performance on CPUs. We use the raytracer in the SIGGRAPH'2004 conference as a case study [2]. The raytracer was shown to run faster on a GPU than on a CPU at the SIGGRAPH conference, but we show that with our parallel programming techniques for CPUs the raytracer can scale well if given multiple CPUs. Programming the CPUs will benefit from kernel fusion, stream contraction, and coarse-grained threading. Before the application of these parallel programming techniques, the raytracer is twice as fast on 4 processors as on uniprocessor. After applying these techniques, we achieve linear speedup. Namely, the scalability improves from 2 to 3.9 from I to 4 processors. This dramatic improvement is due to better memory footprint, locality, and thread granularity. Furthermore, the better memory footprint and locality also improve the uniprocessor performance by 100%. Although the improved performance (71 million ray-triangle intersections per second on a 16-way Intel Xeon multiprocessor) is still slower than the record raytracing performance on the Linux platform (125 million ray-triangle intersections per second [2]), our scalable parallel techniques may continue to scale the performance further when more CPUs become available. |
Year | Venue | Keywords |
---|---|---|
2005 | PDPTA '05: Proceedings of the 2005 International Conference on Parallel and Distributed Processing Techniques and Applications, Vols 1-3 | parallel processing, optimization, streaming, GPU, fusion, contraction, raytracing |
Field | DocType | Citations |
Computer science,Parallel processing,Parallel computing | Conference | 1 |
PageRank | References | Authors |
0.52 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shih-wei Liao | 1 | 703 | 92.73 |
Zhaohui Du | 2 | 196 | 12.76 |
Gansha Wu | 3 | 107 | 9.06 |
Guei-Yuan Lueh | 4 | 401 | 37.41 |