Title
A run-time optimization approach for reducing data movements using locality-aware searching
Abstract
The CPU---GPU communication bottleneck limits the performance improvement of GPU applications in heterogeneous GPGPU systems and usually is handled by data reuse optimization. This paper analyzes data reuse through DAG abstraction and obtains rules showing that the run-time data reuse optimization can effectively relieve the bottleneck. Based on the rules, this paper proposes a run-time optimization framework for data reuse, called R-Tracker. The R-Tracker uses locality-aware searching approach to handle reuses. It can not only low costly implement the data reuse optimization but also effectively implement the searching, the data transfers, and the GPU computation concurrently. R-Tracker relaxes the constraints that are required in compiler-based approaches and thus achieves better reuse effect. The experimental results show that R-Tracker improves the performance by 1.77---16.42 % over compiler-based approach OpenMPC and 1.40---8.39 % over CGCM in single-node execution, and 48.78---60 % over CGCM in multi-node execution.
Year
DOI
Venue
2014
10.1007/s11227-014-1186-x
The Journal of Supercomputing
Keywords
Field
DocType
CPU–GPU,Run-time optimization,Dynamic searching,Data reuse
Bottleneck,Locality,Abstraction,Reuse,Computer science,Parallel computing,Compiler,General-purpose computing on graphics processing units,Distributed computing,Performance improvement,Computation
Journal
Volume
Issue
ISSN
69
2
0920-8542
Citations 
PageRank 
References 
0
0.34
17
Authors
6
Name
Order
Citations
PageRank
Liang Li151.85
Endong Wang275.62
Xingjun Zhang38134.06
Kang Yan400.34
Tao Ju531.76
Xiaoshe Dong617251.44