Title
Random Address Permute-Shift Technique for the Shared Memory on GPUs
Abstract
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access to the shared memory of a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. The main contribution of this paper is to present a novel algorithmic technique called the random address permute-shift (RAP) technique that reduces the memory access congestion. We show that the RAP reduces the memory access congestion to O̅(log w/log log w) for any memory access requests including malicious ones by a warp of w threads. Also, we can guarantee that the congestion is 1 both for contiguous access and for stride access. The simulation results for w=32 show that the expected congestion for any memory access is only 3.53. Since the malicious memory access requests destined for the same bank take congestion 32, our RAP technique substantially reduces the memory access congestion. We have also applied the RAP technique to matrix transpose algorithms. The experimental results on GeForce GTX TITAN show that the RAP technique is practical and can accelerate a direct matrix transpose algorithm by a factor of 10.
Year
DOI
Venue
2014
10.1109/ICPPW.2014.63
Parallel Processing Workshops
Keywords
Field
DocType
graphics processing units,parallel architectures,shared memory systems,cuda-enabled gpu,dmm,geforce gtx titan show,rap technique,algorithmic technique,direct matrix transpose algorithm,discrete memory machine,memory access congestion,memory access request,memory bank,parallel computing model,random address permute-shift technique,shared memory,streaming multiprocessor,cuda,gpu,memory bank conflicts,randomized technique,instruction sets,writing,algorithm design and analysis,pipelines,memory management
Registered memory,Interleaved memory,Uniform memory access,Physical address,Shared memory,Computer science,Parallel computing,Memory map,Distributed shared memory,Flat memory model,Distributed computing
Conference
ISSN
Citations 
PageRank 
1530-2016
0
0.34
References 
Authors
9
3
Name
Order
Citations
PageRank
Koji Nakano11165118.13
Susumu Matsumae23413.19
Yasuaki Ito351160.47