WarpPool: sharing requests with inter-warp coalescing for throughput processors - Citegraph

Paper Info

Title
WarpPool: sharing requests with inter-warp coalescing for throughput processors

Abstract
Although graphics processing units (GPUs) are capable of high compute throughput, their memory systems need to supply the arithmetic pipelines with data at a sufficient rate to avoid stalls. For benchmarks that have divergent access patterns or cause the L1 cache to run out of resources, the link between the GPU's load/store unit and the L1 cache becomes a bottleneck in the memory system, leading to low utilization of compute resources. While current GPU memory systems are able to coalesce requests between threads in the same warp, we identify a form of spatial locality between threads in multiple warps. We use this locality, which is overlooked in current systems, to merge requests being sent to the L1 cache. This relieves the bottleneck between the load/store unit and the cache, and provides an opportunity to prioritize requests to minimize cache thrashing. Our implementation, WarpPool, yields a 38% speedup on memory throughput-limited kernels by increasing the throughput to the L1 by 8% and the reducing the number of L1 misses by 23%. We also demonstrate that WarpPool can improve GPU programmability by achieving high performance without the need to optimize workloads' memory access patterns. A Verilog implementation including place-and route shows WarpPool requires 1.0% added GPU area and 0.8% added power.

Year	DOI	Venue
2015	10.1145/2830772.2830830	MICRO
Keywords	Field	DocType
GPGPU, memory coalescing, memory divergence	Uniform memory access,Cache pollution,Computer science,Cache,Parallel computing,Cache-only memory architecture,Cache algorithms,Real-time computing,Page cache,Non-uniform memory access,Cache coloring,Operating system	Conference
ISBN	Citations	PageRank
978-1-5090-6601-8	6	0.44
References	Authors
21	7

Authors (7 rows)

Cited by (6 rows)

References (21 rows)

Name	Order	Citations	PageRank
John Kloosterman	1	7	0.78
Jonathan Beaumont	2	36	2.85
Mick Wollman	3	6	0.44
Ankit Sethia	4	105	4.91
Ronald G. Dreslinski	5	1258	81.02
Trevor Mudge	6	6139	659.74
Scott Mahlke	7	4811	312.08

1