Title
Parallel data-locality aware stencil computations on modern micro-architectures
Abstract
Novel micro-architectures including the Cell Broadband Engine Architecture and graphics processing units are attractive platforms for compute-intensive simulations. This paper focuses on stencil computations arising in the context of a biomedical simulation and presents performance benchmarks on both the Cell BE and GPUs and contrasts them with a benchmark on a traditional CPU system. Due to the low arithmetic intensity of stencil computations, typically only a fraction of the peak performance of the compute hardware is reached. An algorithm is presented, which reduces the bandwidth requirements and thereby improves performance by exploiting temporal locality of the data. We report on performance improvements over CPU implementations.
Year
DOI
Venue
2009
10.1109/IPDPS.2009.5161031
IPDPS
Keywords
Field
DocType
performance improvement,cpu implementation,cell broadband engine architecture,biomedical simulation,performance benchmarks,attractive platform,modern micro-architectures,traditional cpu system,stencil computation,parallel data-locality aware stencil,bandwidth requirement,peak performance,arithmetic,graphics,probability density function,context modeling,engines,coprocessors,bandwidth,heating,parallel processing,computer architecture,central processing unit,computational modeling,concurrent computing,data mining,hardware,pipelines
Graphics,Central processing unit,Locality,Locality of reference,Computer science,Parallel computing,Stencil,Bandwidth (signal processing),Coprocessor,Concurrent computing
Conference
ISSN
Citations 
PageRank 
1530-2075
12
1.36
References 
Authors
8
5
Name
Order
Citations
PageRank
Matthias Christen119010.20
Olaf Schenk253639.02
Esra Neufeld3133.24
Peter Messmer4467.45
Helmar Burkhart530442.97