Title
Scaling scientific applications on clusters of hybrid multicore/GPU nodes
Abstract
Rapid advances in the performance and programmability of graphics accelerators have made GPU computing a compelling solution for a wide variety of application domains. However, the increased complexity as a result of architectural heterogeneity and imbalances in hardware resources poses significant programming challenges in harnessing the performance advantages of GPU accelerated parallel systems. Moreover, the speedup derived from GPU often gets offset by longer communication latencies and inefficient task scheduling. To achieve the best possible performance, a suitable parallel programming model is therefore essential. In this paper, we explore a new hybrid parallel programming model that incorporates GPU acceleration with the Partitioned Global Address Space (PGAS) programming paradigm. As we demonstrate, by combining Unified Parallel C (UPC) and CUDA as a case study, this hybrid model offers programmers with both enhanced programmability and powerful heterogeneous execution. Two application benchmarks, namely NAS Parallel Benchmark (NPB) FT and MG, are used to show the effectiveness of our proposed hybrid approach. Experimental results indicate that both implementations achieve significantly better performance due to optimization opportunities offered by the hybrid model, such as the funneled execution mode and fine-grained overlapping of communication and computation.
Year
DOI
Venue
2011
10.1145/2016604.2016612
Conf. Computing Frontiers
Keywords
Field
DocType
proposed hybrid approach,scientific application,hybrid multicore,performance advantage,suitable parallel programming model,programming paradigm,gpu acceleration,hybrid model,new hybrid parallel programming,gpu node,better performance,possible performance,significant programming challenge,partitioned global address space,multicore,parallel programming model,parallel systems,upc
Computer architecture,Unified Parallel C,Programming paradigm,Computer science,CUDA,Parallel computing,Real-time computing,Parallel programming model,General-purpose computing on graphics processing units,Partitioned global address space,Multi-core processor,Speedup
Conference
Citations 
PageRank 
References 
8
0.97
12
Authors
4
Name
Order
Citations
PageRank
Lingyuan Wang1383.29
Miaoqing Huang229227.50
Vikram K. Narayana310213.18
Tarek El-Ghazawi442744.88