Abstract | ||
---|---|---|
The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) compiler. Within BTO, an analytic memory model efficiently and accurately reduces the number of serial loop fusion options considered. In this paper, we extend the model to shared memory parallel machines. We detail the differences between parallel and serial memory use and runtime prediction and explain the changes made to include parallel machines in the model. Analysis of the parallel model's predictions show that when it is included in BTO it will reduce the search space of considered routines. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1964218.1964226 | SIGMETRICS Performance Evaluation Review |
Keywords | Field | DocType |
parallel memory prediction,fused linear algebra kernel,parallel model,serial memory use,memory modeling,runtime prediction,analytic memory model,data movement,matrix computation,shared memory parallel machine,serial loop fusion option,parallel processing,parallel machine,auto-tuning,loop fusion,shared memory,memory model,search space,linear algebra | Loop fusion,Uniform memory access,Shared memory,Computer science,Parallel computing,Distributed memory,Memory model,Overlay,Flat memory model,CUDA Pinned memory | Journal |
Volume | Issue | Citations |
38 | 4 | 2 |
PageRank | References | Authors |
0.37 | 24 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ian Karlin | 1 | 95 | 12.30 |
Elizabeth R. Jessup | 2 | 23 | 3.07 |
Geoffrey Belter | 3 | 9 | 1.54 |
Jeremy G. Siek | 4 | 563 | 45.96 |