Abstract | ||
---|---|---|
Optimizing sophisticated PDE-based filtering methods, such as the Anisotropic Nonlinear Diffusion (AND), to GPUs is complicated and time consuming. In this work, we expressed AND as iterative multiple 3D-stencils, where each 3D-stencil is implemented into one kernel, and then we analyzed all possible kernel fusions on the GPU. We experimentally found that fusing dependent stencils with similar concurrency and lower on-chip pressure makes the optimal combination run 1, 52× faster than the next better one. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/CLUSTER.2014.6968786 | Cluster Computing |
Keywords | Field | DocType |
filtering theory,graphics processing units,image processing,3D image filtering,AND,GPU,PDE based filtering methods,anisotropic nonlinear diffusion,onchip pressure | Kernel (linear algebra),Anisotropy,Instruction set,Concurrency,Computer science,Nonlinear diffusion,Parallel computing,Filter (signal processing),Anisotropic filtering,Multi-core processor | Conference |
ISSN | Citations | PageRank |
1552-5244 | 2 | 0.39 |
References | Authors | |
12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
S. Tabik | 1 | 43 | 5.54 |
Alin Murarasu | 2 | 4 | 0.77 |
Luis Felipe Romero | 3 | 2 | 0.39 |