Title
Importance of explicit vectorization for CPU and GPU software performance
Abstract
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version, indicating the importance of optimizing CPU implementations.
Year
DOI
Venue
2010
10.1016/j.jcp.2011.03.041
Clinical Orthopaedics and Related Research
Keywords
DocType
Volume
gpu implementation,monte carlo algorithm,gpu software performance,vectorization,fully-optimized cpu version,monte carlo,ising model,optimizing cpu implementation,fully-optimized gpu version,high-performance computing,explicit memory coalescing,explicit vectorization,original cpu version,performance,gpu,central processing unit,optimization,software performance,explicit memory
Journal
230
Issue
ISSN
Citations 
13
Journal of Computational Physics
14
PageRank 
References 
Authors
1.59
10
3
Name
Order
Citations
PageRank
Neil Dickson1636.72
Kamran Karimi211817.23
Firas Hamze313114.05