Title
Fast Computational GPU Design with GT-Pin
Abstract
As computational applications become common for graphics processing units, new hardware designs must be developed to meet the unique needs of these workloads. Performance simulation is an important step in appraising how well a candidate design will serve these needs, but unfortunately, computational GPU programs are so large that simulating them in detail is prohibitively slow. This work addresses the need to understand very large computational GPU programs in three ways. First, it introduces a fast tracing tool that uses binary instrumentation for in-depth analyses of native executions on existing architectures. Second, it characterizes 25 commercial and benchmark OpenCL applications, which average 308 billion GPU instructions apiece and are by far the largest benchmarks that have been natively profiled at this level of detail. Third, it accelerates simulation of future hardware by pinpointing small subsets of OpenCL applications that can be simulated as representative surrogates in lieu of full-length programs. Our fast selection method requires no simulation itself and allows the user to navigate the accuracy/simulation speed trade-off space, from extremely accurate with reasonable speedups (35X increase in simulation speed for 0.3% error) to reasonably accurate with extreme speedups (223X simulation speedup for 3.0% error).
Year
DOI
Venue
2015
10.1109/IISWC.2015.14
IEEE International Symposium on Workload Characterization
Keywords
Field
DocType
computer simulation, graphics processing unit, performance analysis
Graphics,Level of detail,Computer science,Parallel computing,General-purpose computing on graphics processing units,Graphics processing unit,Tracing,Speedup,Binary number
Conference
Citations 
PageRank 
References 
9
0.57
19
Authors
7
Name
Order
Citations
PageRank
Melanie Kambadur1443.08
Sunpyo Hong290.57
Juan Cabral390.57
Harish Patil42269106.86
Chi-Keung Luk52537116.49
Sohaib Sajid690.57
Martha A. Kim716512.68