Random Address Permute-Shift Technique for the Shared Memory on GPUs - Citegraph

Paper Info

Title
Random Address Permute-Shift Technique for the Shared Memory on GPUs

Abstract
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access to the shared memory of a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. The main contribution of this paper is to present a novel algorithmic technique called the random address permute-shift (RAP) technique that reduces the memory access congestion. We show that the RAP reduces the memory access congestion to O̅(log w/log log w) for any memory access requests including malicious ones by a warp of w threads. Also, we can guarantee that the congestion is 1 both for contiguous access and for stride access. The simulation results for w=32 show that the expected congestion for any memory access is only 3.53. Since the malicious memory access requests destined for the same bank take congestion 32, our RAP technique substantially reduces the memory access congestion. We have also applied the RAP technique to matrix transpose algorithms. The experimental results on GeForce GTX TITAN show that the RAP technique is practical and can accelerate a direct matrix transpose algorithm by a factor of 10.

Year	DOI	Venue
2014	10.1109/ICPPW.2014.63	Parallel Processing Workshops
Keywords	Field	DocType
graphics processing units,parallel architectures,shared memory systems,cuda-enabled gpu,dmm,geforce gtx titan show,rap technique,algorithmic technique,direct matrix transpose algorithm,discrete memory machine,memory access congestion,memory access request,memory bank,parallel computing model,random address permute-shift technique,shared memory,streaming multiprocessor,cuda,gpu,memory bank conflicts,randomized technique,instruction sets,writing,algorithm design and analysis,pipelines,memory management	Registered memory,Interleaved memory,Uniform memory access,Physical address,Shared memory,Computer science,Parallel computing,Memory map,Distributed shared memory,Flat memory model,Distributed computing	Conference
ISSN	Citations	PageRank
1530-2016	0	0.34
References	Authors
9	3

Authors (3 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Koji Nakano	1	1165	118.13
Susumu Matsumae	2	34	13.19
Yasuaki Ito	3	511	60.47

1