Title
Enhancing Address Translations in Throughput Processors via Compression
Abstract
Efficient memory sharing among multiple compute engines plays an important role in shaping the overall application performance on CPU-GPU heterogeneous platforms. Unified Virtual Memory (UVM) is a promising feature that allows globally-visible data structures and pointers such that the GPU can access the physical memory space on the CPU side, and take advantage of the host OS paging mechanism without explicit programmer effort. However, a key requirement for the guaranteed performance is effective hardware support of address translation. Particularly, we observe that GPU execution suffers from high TLB miss rates in a UVM environment, especially for irregular and/or memory-intensive applications. In this paper, we propose simple yet effective compression mechanisms for address translations to improve GPU TLB hit rates. Specifically, we explore and leverage the TLB compressibility during the execution of GPU applications to design efficient address translation compression with minimal runtime overhead. Experimental results across 22 applications indicate that our proposed approach significantly improves GPU TLB hit rates, which translate to 12% average performance improvement. Particularly, for 16 irregular and/or memory-intensive applications, the performance improvements achieved reach up to 69.2%, with an average of 16.3%.
Year
DOI
Venue
2020
10.1145/3410463.3414633
PACT '20: International Conference on Parallel Architectures and Compilation Techniques Virtual Event GA USA October, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-8075-1
1
PageRank 
References 
Authors
0.35
0
6
Name
Order
Citations
PageRank
Xulong Tang154.79
Ziyu Zhang211210.19
Weizheng Xu311.02
Mahmut Taylan Kandemir43811.03
Rami Melhem52537164.09
Jun Yang6435.20