Title
Architecture-aware optimization targeting multithreaded stream computing
Abstract
Optimizing program execution targeted for Graphics Processing Units (GPUs) can be very challenging. Our ability to efficiently map serial code to a GPU or stream processing platform is a time consuming task and is greatly hampered by a lack of detail about the underlying hardware. Programmers are left to attempt trial and error to produce optimized codes. Recent publication of the underlying instruction set architecture (ISA) of the AMD/ATI GPU has allowed researchers to begin to propose aggressive optimizations. In this work, we present an optimization methodology that utilizes this information to accelerate programs on AMD/ATI GPUs. We start by defining optimization spaces that guide our work. We begin with disassembled machine code and collect program statistics provided by the AMD Graphics Shader Analyzer (GSA) profiling toolset. We explore optimizations targeting three different computing resources: 1) ALUs, 2) fetch bandwidth, and 3) thread usage, and present optimization techniques that consider how to better utilize each resource. We demonstrate the effectiveness of our proposed optimization approach on an AMD Radeon HD3870 GPU using the Brook+ stream programming language. We describe our optimizations using two commonly-used GPGPU applications that present very different program characteristics and optimization spaces: matrix multiplication and back-projection for medical image reconstruction. Our results show that optimized code can improve performance by 1.45x--6.7x as compared to unoptimized code run on the same GPU platform. The speedup obtained with our optimized implementations are 882x (matrix multiply) and 19x (back-projection) faster as compared with serial implementations run on an Intel 2.66 GHz Core 2 Duo with a 2 GB main memory.
Year
DOI
Venue
2009
10.1145/1513895.1513903
GPGPU
Keywords
Field
DocType
architecture-aware optimization,proposed optimization approach,amd graphics shader analyzer,present optimization technique,optimized code,optimization space,optimization methodology,multithreaded stream computing,gpu platform,amd radeon hd3870 gpu,disassembled machine code,ati gpu,stream processing,optimization,gpgpu
Computer architecture,Instruction set,Profiling (computer programming),Computer science,Stream,Parallel computing,Machine code,General-purpose computing on graphics processing units,Shader,Stream processing,Speedup
Conference
Citations 
PageRank 
References 
18
1.39
6
Authors
4
Name
Order
Citations
PageRank
Byunghyun Jang133517.56
Synho Do29412.86
Homer Pien311310.40
David Kaeli41535129.85