Title
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures
Abstract
Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.
Year
DOI
Venue
2010
10.1109/CIT.2010.213
CIT
Keywords
Field
DocType
gpu architectures,previous work,reduction idioms,data communication,automatic transformation,data locality,complex reduction codes,complex reduction code,parallel architectures,reduction,original code,compiler techniques,graphics processors,gpu architecture,reduction operation,generalized reductions,many-core architecture,many-core,intermediate reduction result,coprocessors,gpus,reduction idiom,previous reduction work,intermediate result,information technology,complexity reduction,limiting factor
Locality,Computer science,Information technology,Parallel computing,Limiting factor,Coprocessor,Speedup
Conference
ISBN
Citations 
PageRank 
978-1-4244-7547-6
5
0.47
References 
Authors
10
3
Name
Order
Citations
PageRank
Xiao-Long Wu150.47
Nady Obeid2111.59
Wen-mei W. Hwu34322511.62