Title | ||
---|---|---|
Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription |
Abstract | ||
---|---|---|
Unified Memory in heterogeneous systems serves a wide range of applications. However, limited capacity of the device memory becomes a first order performance bottleneck for data-intensive general-purpose applications with increasing working sets. The performance overhead under memory oversubscription depends on the memory access pattern of the corresponding workload. While a regular application with sequential, dense memory access suffers from long latency write-backs, performance of a irregular application with sparse, seldom access to large data-sets degrades due to page thrashing. Although smart spatio-temporal prefetching and large page eviction yield good performance in general, remote zero-copy access to host-pinned memory proves to be beneficial for irregular, data-intensive applications. Further, new generation GPUs introduced hardware access counters to delay page migration and reduce memory thrashing. However, the responsibility of deciding what strategy is the best fit for a given application relies heavily on the programmer based on thorough understanding of the memory access pattern through intrusive profiling. In this work, we propose a programmer-agnostic runtime that leverages the hardware access counters to automatically categorize memory allocations based on the access pattern and frequency. The proposed heuristic adaptively navigates between remote zero-copy access to host-pinned memory and first-touch page migration based on the trade-off between low latency remote access and high-bandwidth local access. We show that although designed to address memory oversubscription, our scheme has no impact on performance when working sets fit in the device-local memory. Experimental results show that our scheme provides performance improvement of 22% to 78% for irregular applications under 125% memory oversubscription compared to the state of the art. At the same time, regular applications are not impacted by the framework. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/IPDPS47924.2020.00054 | 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) |
Keywords | DocType | ISSN |
page migration,pinning,memory management,CPU-GPU,Unified Memory | Conference | 1530-2075 |
ISBN | Citations | PageRank |
978-1-7281-6876-0 | 1 | 0.35 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Debashis Ganguly | 1 | 12 | 3.33 |
Ziyu Zhang | 2 | 112 | 10.19 |
Jun Yang | 3 | 43 | 5.20 |
R. G. Melhem | 4 | 178 | 18.89 |