Parallel memory prediction for fused linear algebra kernels - Citegraph

Paper Info

Title
Parallel memory prediction for fused linear algebra kernels

Abstract
The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) compiler. Within BTO, an analytic memory model efficiently and accurately reduces the number of serial loop fusion options considered. In this paper, we extend the model to shared memory parallel machines. We detail the differences between parallel and serial memory use and runtime prediction and explain the changes made to include parallel machines in the model. Analysis of the parallel model's predictions show that when it is included in BTO it will reduce the search space of considered routines.

Year	DOI	Venue
2011	10.1145/1964218.1964226	SIGMETRICS Performance Evaluation Review
Keywords	Field	DocType
parallel memory prediction,fused linear algebra kernel,parallel model,serial memory use,memory modeling,runtime prediction,analytic memory model,data movement,matrix computation,shared memory parallel machine,serial loop fusion option,parallel processing,parallel machine,auto-tuning,loop fusion,shared memory,memory model,search space,linear algebra	Loop fusion,Uniform memory access,Shared memory,Computer science,Parallel computing,Distributed memory,Memory model,Overlay,Flat memory model,CUDA Pinned memory	Journal
Volume	Issue	Citations
38	4	2
PageRank	References	Authors
0.37	24	4

Authors (4 rows)

Cited by (2 rows)

References (24 rows)

Name	Order	Citations	PageRank
Ian Karlin	1	95	12.30
Elizabeth R. Jessup	2	23	3.07
Geoffrey Belter	3	9	1.54
Jeremy G. Siek	4	563	45.96

1