Title
Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling.
Abstract
Modern GPGPUs support the concurrent execution of thousands of threads to provide an energy-efficient platform. However, the massive multi-threading of GPGPUs incurs serious cache contention, as the cache lines brought by one thread can easily be evicted by other threads in the small shared cache. In this paper, we propose a software-hardware cooperative approach that exploits the spatial locality...
Year
DOI
Venue
2017
10.1109/LCA.2017.2693371
IEEE Computer Architecture Letters
Keywords
Field
DocType
Instruction sets,Cache memory,Dispatching,Two dimensional displays,Graphics processing units
Win32 Thread Information Block,Locality,Locality of reference,Shared memory,Computer science,Cache,Instruction set,Parallel computing,Thread (computing),Cache algorithms,Real-time computing
Journal
Volume
Issue
ISSN
16
2
1556-6056
Citations 
PageRank 
References 
3
0.36
8
Authors
4
Name
Order
Citations
PageRank
Li-Jhan Chen130.36
Hsiang-Yun Cheng2616.07
Po-Han Wang3443.04
Chia-Lin Yang4103376.39