Abstract | ||
---|---|---|
Stencil computations on regular grids are widely used in scientific simulations. Optimization techniques for such stencil computations typically exploit temporal locality across time steps. More complex stencil applications, like those in meteorology and seismic simulations, cannot easily take advantage of these techniques, since the number of physical fields and computation stages to consider at each time step flush all data present in the cache at the beginning of the next time step. In this paper we present a technique for improving performance of such computations, based only on spatial tiling, which is implemented as a generic algorithm. More specifically, we investigate how to take advantage of producer-consumer relations of stencil loops, in a single time step, to improve memory hierarchy utilization. This approach makes it possible to balance computation and communication to improve resource usage. We implement our methods using generic programming constructs of C++, which we compare with hand-tuned implementations of the stencils. The results show that this technique can improve both single-threaded and multi-threaded performance to closely match that of hand-tuned implementations, with the convenience of a high-level specification. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-09873-9_49 | Lecture Notes in Computer Science |
DocType | Volume | ISSN |
Conference | 8632 | 0302-9743 |
Citations | PageRank | References |
3 | 0.39 | 7 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mauro Bianco | 1 | 22 | 1.46 |
Benjamin Cumming | 2 | 3 | 0.39 |