Mascar: Speeding up GPU warps by reducing memory pitstops - Citegraph

Paper Info

Title
Mascar: Speeding up GPU warps by reducing memory pitstops

Abstract
With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. Therefore, managing access to these limited memory resources is a challenge for GPUs. We propose a novel Memory Aware Scheduling and Cache Access Re-execution (Mascar) system on GPUs tailored for better performance for memory intensive workloads. This scheme detects memory saturation and prioritizes memory requests among warps to enable better overlapping of compute and memory accesses. Furthermore, it enables limited re-execution of memory instructions to eliminate structural hazards in the memory subsystem and take advantage of cache locality in cases where requests cannot be sent to the memory due to memory saturation. Our results show that Mascar provides a 34% speedup over the baseline round-robin scheduler and 10% speedup over the state of the art warp schedulers for memory intensive workloads. Mascar also achieves an average of 12% savings in energy for such workloads.

Year	DOI	Venue
2015	10.1109/HPCA.2015.7056031	High Performance Computer Architecture
Keywords	Field	DocType
cache storage,graphics processing units,memory architecture,processor scheduling,gpu computing,gpu warp,mascar system,baseline round-robin scheduler,cache locality,data parallel workload,irregular access pattern,limited memory resource,memory access,memory aware scheduling and cache access reexecution system,memory instruction,memory intensity,memory intensive workload,memory pitstop,memory saturation,memory subsystem,nongraphics workload,structural hazard,warp scheduler	Registered memory,Interleaved memory,Uniform memory access,Computer science,Parallel computing,Cache-only memory architecture,Real-time computing,Memory management,Non-uniform memory access,Memory map,CUDA Pinned memory	Conference
ISSN	Citations	PageRank
1530-0897	31	0.90
References	Authors
28	3

Authors (3 rows)

Cited by (31 rows)

References (28 rows)

Name	Order	Citations	PageRank
Ankit Sethia	1	105	4.91
D. Anoushe Jamshidi	2	351	11.20
Scott Mahlke	3	4811	312.08

1