Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. - Citegraph

Paper Info

Title
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.

Abstract
Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly partitioned and scheduled to execute on either the main, powerful GPU cores that are far away from memory or the auxiliary, simple GPU cores that are close to memory (e.g., in the logic layer of 3D-stacked DRAM). This paper investigates two key code scheduling issues in such a GPU architecture that has PIM capabilities, to maximize performance and energy-efficiency: (1) how to automatically identify the code segments, or kernels, to be offloaded to the cores in memory, and (2) how to concurrently schedule multiple kernels on the main GPU cores and the auxiliary GPU cores in memory. We develop two new runtime techniques: (1) a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and (2) a concurrent kernel management mechanism that uses the affinity prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU cores in memory. Our experimental evaluations across 25 GPU applications demonstrate that these two techniques can significantly improve both application performance (by 25% and 42%, respectively, on average) and energy efficiency (by 28% and 27%).

Year	DOI	Venue
2016	10.1145/2967938.2967940	PACT
Keywords	Field	DocType
GPU architectures,processing-in-memory capabilities,PIM,processor,graphics processing unit,memory bandwidth,3D-stacked DRAM,key code scheduling,energy efficiency,auxiliary GPU cores,regression-based affinity prediction model,concurrent kernel management mechanism,kernel execution time prediction model,kernel dependency information,GPU applications	Dram,Bottleneck,Interleaved memory,Uniform memory access,Memory bandwidth,Scheduling (computing),Computer science,Parallel computing,Real-time computing,Memory management,Graphics processing unit	Conference
ISBN	Citations	PageRank
978-1-5090-5308-7	52	0.88
References	Authors
79	8

Authors (8 rows)

Cited by (52 rows)

References (79 rows)

Name	Order	Citations	PageRank
Ashutosh Pattnaik	1	113	4.70
Xulong Tang	2	128	7.49
Adwait Jog	3	568	23.32
Onur Kayıran	4	356	13.47
Asit K. Mishra	5	1216	46.21
Mahmut T. Kandemir	6	7371	568.54
Onur Mutlu	7	9446	357.40
Chita R. Das	8	1038	59.34

1