Title
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Abstract
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. The dynamic compiler is able to execute existing CUDA binaries without recompilation from source and supports switching between execution on an NVIDIA GPU and a many-core CPU at runtime. It has been validated against over 130 applications taken from the CUDA SDK, the UIUC Parboil benchmarks [1], the Virginia Rodinia benchmarks [2], the GPU-VSIPL signal and image processing library [3], the Thrust library [4], and several domain specific applications. This paper presents a high level overview of the implementation of the Ocelot dynamic compiler highlighting design decisions and trade-offs, and showcasing their effect on application performance. Several novel code transformations are explored that are applicable only when compiling explicitly parallel applications and traditional dynamic compiler optimizations are revisited for this new class of applications. This study is expected to inform the design of compilation tools for explicitly parallel programming models (such as OpenCL) as well as future CPU and GPU architectures.
Year
DOI
Venue
2010
10.1145/1854273.1854318
PACT
Keywords
Field
DocType
data parallel execution model,traditional dynamic compiler optimizations,parallel application,nvidia cuda application,ocelot dynamic compiler,cuda binary,heterogeneous system,dynamic optimization framework,bulk-synchronous application,dynamic compiler,cuda sdk,dynamic binary translator,dynamic compilation framework,signal and image processing,virtual machine,dynamic compilation,parallel programming model,code generation
x86,Dynamic compilation,Virtual machine,CUDA,Computer science,Parallel computing,Optimizing compiler,Compiler,Code generation,Execution model,Operating system
Conference
ISBN
Citations 
PageRank 
978-1-5090-5032-1
110
5.51
References 
Authors
19
4
Search Limit
100110
Name
Order
Citations
PageRank
Gregory Frederick Diamos1111751.07
Andrew Robert Kerr21105.51
Sudhakar Yalamanchili31836184.95
Nathan Clark474232.44