Abstract | ||
---|---|---|
Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel polyhedral scheduling technique, with the aim to reduce the need for auto-tuning while allowing to build customizable and specific transformation strategies. We design constraints and objectives that model several crucial aspects of performance such as stride optimization or the trade-off between parallelism and reuse, while taking into account important architectural features of the target machine. The developed set of objectives embody a Performance Vocabulary for loop transformations. The goal is to use this vocabulary, consisting of performance idioms, to construct transformation recipes adapted to a number of program classes. We evaluate our work using the PolyBench/C benchmark suite and experimentally validate it against large optimization spaces generated with the Pluto compiler on a 10-core Intel Core-i9 (Skylake-X). Our results show that we can achieve comparable or superior performance to Pluto on the majority of benchmarks, without implementing tiling in the source code nor using experimental autotuning. |
Year | Venue | DocType |
---|---|---|
2018 | arXiv: Distributed, Parallel, and Cluster Computing | Journal |
Volume | Citations | PageRank |
abs/1811.06043 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Martin Kong | 1 | 89 | 6.18 |
Louis-noël Pouchet | 2 | 880 | 47.61 |