Title
Parallel memory prediction for fused linear algebra kernels
Abstract
The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) compiler. Within BTO, an analytic memory model efficiently and accurately reduces the number of serial loop fusion options considered. In this paper, we extend the model to shared memory parallel machines. We detail the differences between parallel and serial memory use and runtime prediction and explain the changes made to include parallel machines in the model. Analysis of the parallel model's predictions show that when it is included in BTO it will reduce the search space of considered routines.
Year
DOI
Venue
2011
10.1145/1964218.1964226
SIGMETRICS Performance Evaluation Review
Keywords
Field
DocType
parallel memory prediction,fused linear algebra kernel,parallel model,serial memory use,memory modeling,runtime prediction,analytic memory model,data movement,matrix computation,shared memory parallel machine,serial loop fusion option,parallel processing,parallel machine,auto-tuning,loop fusion,shared memory,memory model,search space,linear algebra
Loop fusion,Uniform memory access,Shared memory,Computer science,Parallel computing,Distributed memory,Memory model,Overlay,Flat memory model,CUDA Pinned memory
Journal
Volume
Issue
Citations 
38
4
2
PageRank 
References 
Authors
0.37
24
4
Name
Order
Citations
PageRank
Ian Karlin19512.30
Elizabeth R. Jessup2233.07
Geoffrey Belter391.54
Jeremy G. Siek456345.96